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Abstract. The blockchain paradigm when coupled with cryptographically-secured transactions has demonstrated its 
utility through a number of projects, not the least being Bitcoin. Each such project can be seen as a simple application 
on a decentralised, but singleton, compute resource. We can call this paradigm a transactional singleton machine with 
shared-state. 

Ethereum implements this paradigm in a generalised manner. Furthermore it provides a plurality of such resources, 
each with a distinct state and operating code but able to interact through a message-passing framework with others. 
We discuss its design, implementation issues, the opportunities it provides and the future hurdles we envisage. 


1. Introduction 

With ubiquitous internet connections in most places 
of the world, global information transmission has become 
incredibly cheap. Technology-rooted movements like Bit- 
coin have demonstrated, through the power of the default, 
consensus mechanisms and voluntary respect of the social 
contract that it is possible to use the internet to make 
a decentralised value-transfer system, shared across the 
world and virtually free to use. This system can be said 
to be a very specialised version of a cryptographically se- 
cure, transaction-based state machine. Follow-up systems 
such as Namecoin adapted this original “currency appli- 
cation” of the technology into other applications albeit 
rather simplistic ones. 

Ethereum is a project which attempts to build the gen- 
eralised technology; technology on which all transaction- 
based state machine concepts may be built. Moreover it 
aims to provide to the end-developer a tightly integrated 
end-to-end system for building software on a hitherto un- 
explored compute paradigm in the mainstream: a trustful 
object messaging compute framework. 


1.1. Driving Factors. There are many goals of this 
project; one key goal is to facilitate transactions be- 
tween consenting individuals who would otherwise have 
no means to trust one another. This may be due to 
geographical separation, interfacing difficulty, or perhaps 
the incompatibility, incompetence, unwillingness, expense, 
uncertainty, inconvenience or corruption of existing legal 
systems. By specifying a state-change system through a 
rich and unambiguous language, and furthermore archi- 
tecting a system such that we can reasonably expect that 
an agreement will be thus enforced autonomously, we can 
provide a means to this end. 

Dealings in this proposed system would have several 
attributes not often found in the real world. The incor- 
ruptibility of judgement, often difficult to find, comes nat- 
urally from a disinterested algorithmic interpreter. Trans- 
parency, or being able to see exactly how a state or judge- 
ment came about through the transaction log and rules 
or instructional codes, never happens perfectly in human- 
based systems since natural language is necessarily vague, 


information is often lacking, and plain old prejudices are 
difficult to shake. 

Overall, I wish to provide a system such that users can 
be guaranteed that no matter with which other individ- 
uals, systems or organisations they interact, they can do 
so with absolute confidence in the possible outcomes and 
how those outcomes might come about. 

1.2. Previous Work. Buterin [2013a] first proposed the 
kernel of this work in late November, 2013. Though now 
evolved in many ways, the key functionality of a block- 
chain with a Turing-complete language and an effectively 
unlimited inter-transaction storage capability remains un- 
changed. 

Dwork and Naor [1992] provided the first work into the 
usage of a cryptographic proof of computational expendi- 
ture ( “proof-of-work” ) as a means of transmitting a value 
signal over the Internet. The value-signal was utilised here 
as a spam deterrence mechanism rather than any kind 
of currency, but critically demonstrated the potential for 
a basic data channel to carry a strong economic signal, 
allowing a receiver to make a physical assertion without 
having to rely upon trust. Back [2002] later produced a 
system in a similar vein. 

The first example of utilising the proof-of-work as a 
strong economic signal to secure a currency was by Vish- 
numurthy et al. [2003]. In this instance, the token was 
used to keep peer-to-peer file trading in check, provid- 
ing “consumers” with the ability to make micro-payments 
to “suppliers” for their services. The security model af- 
forded by the proof-of-work was augmented with digital 
signatures and a ledger in order to ensure that the histor- 
ical record couldn’t be corrupted and that malicious ac- 
tors could not spoof payment or unjustly complain about 
service delivery. Five years later, Nakamoto [2008] in- 
troduced another such proof-of-work-secured value token, 
somewhat wider in scope. The fruits of this project, Bit- 
coin, became the first widely adopted global decentralised 
transaction ledger. 

Other projects built on Bitcoin’s success; the alt-coins 
introduced numerous other currencies through alteration 
to the protocol. Some of the best known are Litecoin and 
Primecoin, discussed by Sprankel [2013]. Other projects 
sought to take the core value content mechanism of the 
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protocol and repurpose it; Aron [2012] discusses, for ex- 
ample, the Namecoin project which aims to provide a de- 
centralised name-resolution system. 

Other projects still aim to build upon the Bitcoin net- 
work itself, leveraging the large amount of value placed in 
the system and the vast amount of computation that goes 
into the consensus mechanism. The Mastercoin project, 
first proposed by Willett; [2013], aims to build a richer 
protocol involving many additional high-level features on 
top of the Bitcoin protocol through utilisation of a num- 
ber of auxiliary parts to the core protocol. The Coloured 
Coins project, proposed by Rosenfeld [2012], takes a sim- 
ilar but more simplified strategy, embellishing the rules 
of a transaction in order to break the fungibility of Bit- 
coin’s base currency and allow the creation and tracking of 
tokens through a special “chroma- wallet” -protocol-aware 
piece of software. 

Additional work has been done in the area with discard- 
ing the decentralisation foundation; Ripple, discussed by 
Boutellier and Heinzen [2014], has sought to create a “fed- 
erated” system for currency exchange, effectively creating 
a new financial clearing system. It has demonstrated that 
high efficiency gains can be made if the decentralisation 
premise is discarded. 

Early work on smart contracts has been done by:Szabo 
[1997] and Miller [1997]. Around the 1990s it became clear 
that algorithmic enforcement of agreements could become 
a significant force in human cooperation. Though no spe- 
cific system was proposed to implement such a system, 
it was proposed that the future of law would be heavily 
affected by such systems. In this light, Ethereum may 
be seen as a general implementation of such a crypto-law 
system. 

For a list of terms used in this paper, refer to Appendix 
A. 

2. The Blockchain Paradigm 

Ethereum, taken as a whole, can be viewed as a 
transaction-based state machine: we begin with a gene- 
sis state and incrementally execute transactions to morph 
it into some final state. It is this final state which we ac- 
cept as the canonical “version” of the world of Ethereum. 
The state can include such information as account bal- 
ances, reputations, trust arrangements, data pertaining 
to information of the physical world; in short, anything 
that can currently be represented by a computer is admis- 
sible. Transactions thus represent a valid arc between two 
states; the ‘valid’ part is important — there exist far more 
invalid state changes than valid state changes. Invalid 
state changes might, e.g., be things such as reducing an 
account balance without an equal and opposite increase 
elsewhere. A valid state transition is one which comes 
about through a transaction. Formally: 

(1) a t+1 = T(a t ,T) 

where T is the Ethereum state transition function. In 
Ethereum, T, together with or are considerably more pow- 
erful then any existing comparable system; T allows com- 
ponents to carry out arbitrary computation, while a al- 
lows components to store arbitrary state between trans- 
actions. 

Transactions are collated into blocks; blocks are 
chained together using a cryptographic hash as a means of 


reference. Blocks function as a journal or ledger, recording 
a series of transactions together with the previous block 
and an identifier for the final state (though blocks do not 
store the final state itself -that would be far too big). 
They also punctuate the transaction series with incentives 
for nodes to mine. This incentivisation takes place as a 
state-transition function, adding value to a nominated ac- 
count. 

Mining is the process of dedicating effort (working) to 
bolster one series of transactions (a block) over any other 
potential competitor block. It is achieved thanks to a 
cryptographically secure proof. This scheme is known as 
a proof-of-work and is discussed in detail in section Tl. 5: 

Formally, we expand to: 


(2) 

cr t+i 

= n (a t ,B) 


(3) 

B 

= (■■•, (To, Ti, . 

••)) 

(4) 

n (cr,B) 

= <>:7j; T[T(er 

■,To),Ti)...) 


Where :Q: is the block-finalisation state transition func- 
tion (a function that rewards a nominated party); B is 
this block, which includes a series of transactions amongst 
some other components; and II is the block-level state- 
transition function. 

This is the basis of the blockchain paradigm, a model 
that forms the backbone of not only Ethereum, but all de- 
centralised consensus- based transaction systems to date. 

2.1. Value. In order to incentivise computation within 
the network, there needs to be an agreed method for trans- 
mitting value. To address this issue, Ethereum has an in- 
trinsic currency, Ether, known also as ETH and sometimes 
referred to by the Old English B. The smallest subdenom- 
ination of Ether, and thus the one in which all integer val- 
ues of the currency are counted, is the Wei. One Ether is 
defined as being 10 18 Wei. There exist other subdenomi- 
nations of Ether: 


Multiplier 

Name 

10° 

Wei 

10 12 

Szabo 

10 15 

Finney 

10 18 

Ether 


Throughout the present work, any reference to value, 
in the context of Ether, currency, a balance or a payment, 
should be assumed to be counted in Wei. 

2.2. Which History? Since the system is decentralised 
and all parties have an opportunity to create a new block 
on some older pre-existing block, the resultant structure is 
necessarily a tree of blocks. In order to form a consensus as 
to which path, from root (the genesis block) to leaf (the 
block containing the most recent transactions) through 
this tree structure, known as the blockchain, there must 
be an agreed-upon scheme. If there is ever a disagree- 
ment between nodes as to which root-to-leaf path down 
the block tree is the ‘best’ blockchain, then a fork occurs. 

This would mean that past a given point in time 
(block), multiple states of the system may coexist: some 
nodes believing one block to contain the canonical transac- 
tions, other nodes believing some other block to be canoni- 
cal, potentially containing radically different or incompat- 
ible transactions. This is to be avoided at all costs as the 
uncertainty that would ensue would likely kill all confi- 
dence in the entire system. 


ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER EIP-150 REVISION (e39fbc8 - 2017-09-26) 3 


The scheme we use in order to generate consensus is a 
simplified version of the GHOST protocol introduced by 
Sompolinsky and Zohar [2013]. This process is described 
in detail in section TO: 

Sometimes, a path follows a new protocol from a partic- 
ular height. This document describes one version of the 
protocol. In order to follow back the history of a path, 
one might need to reference multiple versions of this doc- 
ument. 


3. Conventions 

I use a number of typographical conventions for the 
formal notation, some of which are quite particular to the 
present work: 

The two sets of highly structured, ‘top-level’, state val- 
ues, are denoted with bold lowercase Greek letters. They 
fall into those of world-state, which are denoted <x (or a 
variant thereupon) and those of machine-state, fi. 

Functions operating on highly structured values are 
denoted with an upper-case greek letter, e.g. ;T; the 
Ethereum state transition function. 

For most functions, an uppercase letter is used, e.g. 
C, the general cost function. These may be subscripted 
to denote specialised variants, e.g. Csstore, the cost func- 
tion for the sstore operation. For specialised and possibly 
externally defined functions, I may format as typewriter 
text, e.g. the Keccak-256 hash function (as per the winning 
entry to the SHA-3 contest) is denoted KEC (and generally 
referred to as plain Keccak). Also KEC512 is referring to 
the Keccak 512 hash function. 

Tuples are typically denoted with an upper-case letter, 
e.g. T, is used to denote an Ethereum transaction. This 
symbol may, if accordingly defined, be subscripted to re- 
fer to an individual component, e.g. T n , denotes the; nonce 
;of said transaction: The form of the subscript is used to 
denote its type; e.g. uppercase subscripts refer to tuples 
with subscriptable components. 

Scalars and fixed-size byte sequences (or, synony- 
mously, arrays) are denoted with a normal lower-case let- 
ter, e.g. n is used in the document to denote a Transaction; 
nonce: Those with a particularly special meaning may be 
greek, e.g. 5, the number of items required on the stack 
for a given operation. 

Arbitrary-length sequences are typically denoted as a 
bold lower-case letter, e.g. o is used to denote the byte 
sequence given as the output data of a message call. For 
particularly important values, a bold uppercase letter may 
be used. 

Throughout, we assume scalars are positive integers 
and thus belong to the set P. The set of all byte sequences 
is B, formally defined in Appendix :Bl If such a set of se- 
quences is restricted to those of a particular length, it is 
denoted with a subscript, thus the set of all byte sequences 
of length 32 is named B 32 and the set of all positive in- 
tegers smaller than 2 256 is named P 256 - This is formally 
defined in section 4.3) 

Square brackets are used to index into and reference 
individual components or subsequences of sequences, e.g. 
Ms [0] denotes the first item on the machine’s stack. For 
subsequences, ellipses are used to specify the intended 
range, to include elements at both limits, e.g. /x m [0..31] 
denotes the first 32 items of the machine’s memory. 


In the case of the global state cr, which is a sequence of 
accounts, themselves tuples, the square brackets are used 
to reference an individual account. 

When considering variants of existing values, I follow 
the rule that within a given scope for definition, if we 
assume that the unmodified ‘input’ value be denoted by 
the placeholder □ then the modified and utilisable value 
is denoted as and intermediate values would be □*, 
□ ** &c. On very particular occasions, in order to max- 
imise readability and only if unambiguous in meaning, I 
may use alpha-numeric subscripts to denote intermediate 
values, especially those of particular note. 

When considering the use of existing functions, given 
a function /, the function /* denotes a similar, element- 
wise version of the function mapping instead between se- 
quences. It is formally defined in section :4.3; 

I define a number of useful functions throughout. One 
of the more common is i, which evaluates to the last item 
in the given sequence: 

(5) £(x) = x[||x|| - 1] 

4. Blocks, State and Transactions 

Having introduced the basic concepts behind 
Ethereum, we will discuss the meaning of a transaction, a 
block and the state in more detail. 

4.1. World State. The world state (state), is a map- 
ping between addresses (160-bit identifiers) and account 
states (a data structure serialised as RLP, see Appendix 
;B). Though not stored on the blockchain, it is assumed 
that the implementation will maintain this mapping in a 
modified Merkle Patricia tree (trie, see Appendix:!)). The 
trie requires a simple database backend that maintains a 
mapping of bytearrays to bytearrays; we name this under- 
lying database the state database. This has a number of 
benefits; firstly the root node of this structure is crypto- 
graphically dependent on all internal data and as such its 
hash can be used as a secure identity for the entire sys- 
tem state. Secondly, being an immutable data structure, 
it allows any previous state (whose root hash is known) to 
be recalled by simply altering the root hash accordingly. 
Since we store all such root hashes in the blockchain, we 
are able to trivially revert to old states. 

The account state comprises the following four fields: 
nonce: A scalar value equal to the number of trans- 
actions sent from this address or, in the case 
of accounts with associated code, the number of 
contract-creations made by this account. For ac- 
count of address a in state cr, this would be for- 
mally denoted cr [Un- 
balance: A scalar value equal to the number of Wei 
owned by this address. Formally denoted cr[a]b. 
storageRoot: A 256-bit hash of the root node of a 
Merkle Patricia tree that encodes the storage con- 
tents of the account (a mapping between 256-bit 
integer values), encoded into the trie as a map- 
ping from the Keccak 256-bit hash of the 256-bit 
integer keys to the RLP-encoded 256-bit integer 
values. The hash is formally denoted cr[a] s . 
codeHash: The hash of the EVM code of this 
account — this is the code that gets executed 
should this address receive a message call; it is 
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immutable and thus, unlike all other fields, can- 
not be changed after construction. All such code 
fragments are contained in the state database un- 
der their corresponding hashes for later retrieval. 
This hash is formally denoted <r[a] c , and thus the 
code may be denoted as b, given that KEC(b) = 
o'M c. 

Since I typically wish to refer not to the trie’s root hash 
but to the underlying set of key/value pairs stored within, 
I define a convenient equivalence: 

(6) TRIE(L/(<r[a] s )) = er[a] s 

The collapse function for the set of key /value pairs in 
the trie, L}, is defined as the element-wise transformation 
of the base function Li, given as: 

(7) L r ({k,v)) = (KEC(fc),RLP(w)) 
where: 

( 8 ) k G B 32 A 

It shall be understood that er[a] s is not a ‘physical’ 
member of the account and does not contribute to its later 
serialisation. 

If the codeHash field is the Keccak-256 hash of the 
empty string, i.e. £r[fl] c = KEC(()), then the node repre- 
sents a simple account, sometimes referred to as a “non- 
contract” account. 

Thus we may define a world-state collapse function Ls' 

(9) L s (cr) = {p(a) : cr [a] ^ 0 } 
where 

(10) p(a) = (KEC(a),RLP(((r[a] n ,cr[a]b,<T[a] s ,(T[a] c ))) 

This function, Ls, is used alongside the trie function 
to provide a short identity (hash) of the world state. We 
assume: 

(11) Va : <tM = 0 V (a € B 20 A v(<r[a])) 
where v is the account validity function: 

(12) v(x) =x n G P 256 Axb € P 256 Ax s € IfeAaJc £ B32 

4.2. The Transaction. A transaction (formally, T) is a 
single cryptographically-signed instruction constructed by 
an actor externally to the scope of Ethereum. While it is 
assumed that the ultimate external actor will be human in 
nature, software tools will be used in its construction and 
dissemination 1 ; There are two types of transactions: those 
which result in message calls and those which result in 
the creation of new accounts with associated code (known 
informally as ‘contract creation’). Both types specify a 
number of common fields: 


gasLimit: A scalar value equal to the maximum 
amount of gas that should be used in executing 
this transaction. This is paid up-front, before any 
computation is done and may not be increased 
later; formally T s . 

to: The 160-bit address of the message call’s recipi- 
ent or, for a contract creation transaction, 0 , used 
here to denote the only member of Bo ; formally 
T t . 

value: A scalar value equal to the number of Wei to 
be transferred to the message call’s recipient or, 
in the case of contract creation, as an endowment 
to the newly created account; formally T v . 
v, r, s: Values corresponding to the signature of the 
transaction and used to determine the sender of 
the transaction; formally T w , ;T r ; and T s ; This is 
expanded in Appendix (FI 

Additionally, a contract creation transaction contains: 
init: An unlimited size byte array specifying the 
EVM-code for the account initialisation proce- 
dure, formally Ti. 

init is an EVM-code fragment; it returns the body, 
a second fragment of code that executes each time the 
account receives a message call (either through a trans- 
action or due to the internal execution of code), init is 
executed only once at account creation and gets discarded 
immediately thereafter. 

In contrast, a message call transaction contains: 

data: An unlimited size byte array specifying the 
input data of the message call, formally Td. 


(13) 

Lt(T) 


(Tn, T p , Tg, T t 

(Tn, T p , Tg, Tt 


/ v . ![, / w . / r , / s ) if Tt — 0 
T v j T d ; T w ; T r , T s ) otherwise 


Here, we assume all components are interpreted by the 
RLP as integer values, with the exception of the arbitrary 
length byte arrays Ti and Td. 


( 14 ) 

Tn G P256 

A 

■T v ;g P256 

A T p G P256 


Tg G P256 

A 

7 w G ? 5 

A T r G P256 


T s G IP256 

A 

Td G B 

A T G B 

where 





( 15 ) 

Pn 

= {F 

’:?GPAP< 2 “) 


The address hash Tt is slightly different: it is either a 
20 -byte address hash or, in the case of being a contract- 
creation transaction (and thus formally equal to 0 ), it is 
the RLP empty byte sequence and thus the member of Bo: 

(16) T t G /® 20 if T ^ 0 

I Bo otherwise 


nonce: A scalar value equal to the number of trans- 
actions sent by the sender; formally T n . 
gasPrice: A scalar value equal to the number of 
Wei to be paid per unit of gas for all computa- 
tion costs incurred as a result of the execution of 
this transaction; formally T p . 


4.3. The Block. The block in Ethereum is the collec- 
tion of relevant pieces of information (known as the block 
header ), H , together with information corresponding to 
the comprised transactions, T, and a set of other block 
headers U that are known to have a parent equal to the 
present block’s parent’s parent (such blocks are known as 


1 Notably, such ‘tools’ could ultimately become so causally removed from their human-based initiation — or humans may become so 
causally-neutral — that there could be a point at which they rightly be considered autonomous agents, e.g. contracts may offer bounties to 
humans for being sent transactions to initiate their execution. 

2 ommer is the most prevalent (which is not saying much, as it is not a well-known word) gender-neutral term to mean “sibling of 
parent” ; see http://nonbinary.Org/wiki/Gender_neutral_language#Family_Terms 
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ommers 2 ). The block header contains several pieces of 
information: 

parentHash: The Keccak 256-bit hash of the par- 
ent block’s header, in its entirety; formally H p . 
ommersHash: The Keccak 256-bit hash of the om- 
mers list portion of this block; formally H 0 ', 
beneficiary: The 160-bit address to which all fees 
collected from the successful mining of this block 
shall be transferred; formally H c . 
stateRoot: The Keccak 256-bit hash of the root 
node of the state trie, after all transactions are 
executed and finalisations applied; formally \H T , 
transactionsRoot: The Keccak 256-bit hash of the 
root node of the trie structure populated with 
each transaction in the transactions list portion 
of the block; formally Ht'. 

receiptsRoot: The Keccak 256-bit hash of the root 
node of the trie structure populated with the re- 
ceipts of each transaction in the transactions list 
portion of the block; formally \H P ; 
logsBloom: The Bloom filter composed from in- 
dexable information (logger address and log top- 
ics) contained in each log entry from the receipt of 
each transaction in the transactions list; formally 

difficulty: A scalar value corresponding to the dif- 
ficulty level of this block. This can be calculated 
from the previous block’s difficulty level and the 
timestamp; formally Hd- 

number: A scalar value equal to the number of an- 
cestor blocks. The genesis block has a number of 
zero; formally Hi. 

gasLimit: A scalar value equal to the current limit 
of gas expenditure per block; formally Hi. 
gasUsed: A scalar value equal to the total gas used 
in transactions in this block; formally H s . 
timestamp: A scalar value equal to the reasonable 
output of Unix’s time() at this block’s inception; 
formally ]H a : 

extraData: An arbitrary byte array containing 
data relevant to this block. This must be 32 bytes 
or fewer; formally R x . 

mixHash: A 256-bit hash which proves combined 
with the nonce that a sufficient amount of compu- 
tation has been carried out on this block; formally 
H m . 

nonce: A 64-bit hash which proves combined with 
the mix-hash that a sufficient amount of compu- 
tation has been carried out on this block; formally 
An- 

The other two components in the block are simply a list 
of ommer block headers (of the same format as above), 
Bu, and a series of the transactions, Bt- Formally, we 
can refer to a block B: 

(17) B = (Bh, Bt, Bu) 

4.3.1. Transaction Receipt. In order to encode informa- 
tion about a transaction concerning which it may be use- 
ful to form a zero-knowledge proof, or index and search, 
we encode a receipt of each transaction containing cer- 
tain information from concerning its execution. Each re- 
ceipt, denoted Br[i] for the ith transaction, is placed in 


an index-keyed trie and the root recorded in the header as 

The transaction receipt is a tuple of four items com- 
prising the post-transaction state, Rcr, the cumulative gas 
used in the block containing the transaction receipt as of 
immediately after the transaction has happened, 7? u , the 
set of logs created through execution of the transaction, \Ri- 
and the Bloom filter composed from information in those 
logs, /.’i,: 

(18) R= (Ra-, Ru,Rh[Ri) 

The function L R trivially prepares a transaction receipt 
for being transformed into an RLP-serialised byte array: 

(19) L r {R) - (TRIE: /.s(/»>„ 

thus the post-transaction state, R „ is encoded into a trie 
structure, the root of which forms the first item. 

We assert :7? u ; the cumulative gas used is a positive in- 
teger and that the logs Bloom, R.b\ is a hash of size 2048 
bits (256 bytes): 

(20) :R U : £ P A ] Rb;£B256 

The log entries, :Rr. is a series of log entries, termed, 
for example, (Oo,Oi, ...). A log entry, O, is a tuple of a 
logger’s address, O a , a series of 32-bytes log topics, Ot 
and some number of bytes of data, O a : 

(21) O=(Oa,(Ot 0 ,Oti,...)>Od) 


(22) Ob. £ B 20 A VtgOt : 1 £ ®32 A Od £ B 


We define the Bloom filter function, M, to reduce a log 
entry into a single 256-byte hash: 

(23) M(0) = \/ (M 3 ;2048(f)) 

£€{O a }UOt 

where M 3 ,2048 is a specialised Bloom filter that sets 
three bits out of 2048, given an arbitrary byte sequence. 
It does this through taking the low-order 11 bits of each 
of the first three pairs of bytes in a Keccak-256 hash of 
the byte sequence. /footnotell bits = 2 2 048, and the low- 
order 11 bits is the modulo 2048 of the operand, which is 
in this case is ’’each of the first three pairs of bytes in a 
Keccak-256 hash of the byte sequence.” Formally: 


(24) A/ 3 :2048(x : x G B) = y : y £ B 2 56 where: 

(25) y = (0,0,..., 0) except: 

(26) V ie{0 ,2,4> : B m (x,i)(y) = 1 

(27) m(x, i) = KEC(x)[i, * + 1] mod 2048 

where B is the bit reference function such that Bj(x) 
equals the bit of index j (indexed from 0) in the byte array 


4.3.2. Holistic Validity. We can assert a block’s validity if 
and only if it satisfies several conditions: it must be in- 
ternally consistent with the ommer and transaction block 
hashes and the given transactions Bt (as specihed in sec 
11), when executed in order on the base state a (derived 
from the final state of the parent block), result in a new 
state of the identity H r \ 

(28) 

H r = TRIE(Ls(n(<r, B))) A 

H 0 = KEC(RLP ( L* h (Bu ) ) ) A 

Ht = TRIE({Vi < |[RtH, i£ P : p(i, Lt(Bt [*]))}) A 

H e = TRIE({Vi < \\B R \\,i £ P :p(i,L R (B R \i}))}) A 

Hb = VreflR ( rb ) 
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where p(k, v ) is simply the pairwise RLP transforma- 
tion, in this case, the first being the index of the trans- 
action in the block and the second being the transaction 
receipt: 

(29) p(k,v) = (RLP(fc), RLP(u)) 

Furthermore: 


(30) TRIE (Ls(tr)) = P(B h )h t 

Thus TRIE(Ls(ct)) is the root node hash of the Merkle 
Patricia tree structure containing the key-value pairs of 
the state cr with values encoded using RLP, and P(Bh) 
is the parent block of B, defined directly. 

The values stemming from the computation of transac- 
tions, specifically the transaction receipts, Br, and that 
defined through the transactions state-accumulation func- 
tion, II, are formalised later in section 'll. 4; 


4.3.3. Serialisation. The function Lb and Lh are the 
preparation functions for a block and block header respec- 
tively. Much like the transaction receipt preparation func- 
tion Lb, we assert the types and order of the structure for 
when the RLP transformation is required: 

(31) /.„(//! = ( // p .//,,.// t: . /A,//,.//,.. 

// i .// 1 ;// i! :// s :// x ,7/ m , jin'.) 

(32) L b (B) = (Lh(Bh),L* t (B t ),L* h (B v )) 

With Z/t and L* H being element-wise sequence trans- 
formations, thus: 

(33) 

/*(( X 0 ,xi,...)) = (f(xo),f(xi),...) for any function / 


The component types are defined thus: 


(31) //,, C b 32 A 

.H r -£ B 32 A 

7/b € ®256 A 

/ / 1 (• ? A 

// x e 3 a 

where 


■H 0 £ B 32 
'.Ht 'P B 32 

7/ d c F 
//„ c P 

:Hm\£ B32 


A ‘H c £ B 20 
A 'H e -£ B 32 
A lli':c P 
A '.Hs[£ P 256 
A 7B n :eB 8 


A 

A 

A 

A 


(35) B n = {B : B € B A ||B|| = n} 


We now have a rigorous specification for the construc- 
tion of a formal block structure. The RLP function RLP 
(see Appendix;!?) provides the canonical method for trans- 
forming this structure into a sequence of bytes ready for 
transmission over the wire or storage locally. 


4.3.4. Block Header Validity. We define P(Bh ) to be the 
parent block of B, formally: 

(36) P(H) - B ’ : KEC(RLP(Bjj)) - //„ 

The block number is the parent’s block number incre- 
mented by one: 

(37) Hi - /’;//;■/,) + 1 

The canonical difficulty of a block of header H is de- 
fined as D(H): 

(38) 

(Do if H\ = 0 

D(H) = < , . 

I max(Z?o, P{H)H d + x x C 2 + e) otherwise 

where: 

(39) D 0 = 131072 


(41) 


?2 


max 



■H s -P{H) Hs 

10 



(42) 


e _ I 2fTiH-100000J-2 


The canonical gas limit'JZfof a block of header H must 
fulfil the relation: 


(43) 

(44) 

(45) 


Hi 

< P(H) Hl + 

P(H)hi 

1024 

Hi 

> P(H)hi - 

P(H)hi 

1024 

Hi 

^ 125000 



H s . is the timestamp of block H and must fulfil the 
relation: 


(46) 


//»> P(H)h. 


This mechanism enforces a homeostasis in terms of the 
time between blocks; a smaller period between the last two 
blocks results in an increase in the difficulty level and thus 
additional computation required, lengthening the likely 
next period. Conversely, if the period is too large, the 
difficulty, and expected time to the next block, is reduced. 

The nonce of a block, 7/ n , must satisfy the relations: 

0 256 


(47) 


n ^ 


//d 


A 


m — Hn 


with (n, m) = PoV(Hn^H n ,d). 

Where Z/h is the new block’s header H, but without the 
nonce: and mix-hash block components, d being the cur- 
rent DAG, a large data set needed to compute the mix- 
hash, and PoW is the proof-of-work function (see section 
;11.5): this evaluates to an array with the first item be- 
ing the mix-hash, to prove that a correct DAG has been 
used, and the second item being a pseudo-random num- 
ber cryptographically dependent on H and d. Given an 
approximately uniform distribution in the range [0, 2 64 ), 
the expected time to find a solution is proportional to the 
difficulty, Hd’. 

This is the foundation of the security of the blockchain 
and is the fundamental reason why a malicious node can- 
not propagate newly created blocks that would otherwise 
overwrite (“rewrite”) history. Because the ;block nonce; 
must satisfy this requirement, and because its satisfaction 
depends on the contents of the block and in turn its com- 
posed transactions, creating new, valid, blocks is difficult 
and, over time, requires approximately the total compute 
power of the trustworthy portion of the mining peers. 

Thus we are able to define the block header validity 


function V(H)\ 

0 256 


(48) V(H) = 

n < Am H m : A 

■Hd. 

(49) 

//,, - />://) A 


(50) 

H s ; < Hi ; A 


(51) 

Hy< P{H)hi + 

P(H)hi 

1024 

(52) 

Hr> P(H) Hl - 

P(H)hi 

1024 

(53) 

//,> 125000 A 


(54) 

H s \> P(H)h s 

A 

(55) 

Hi = P(H) Hi + 1 A 

(56) 

Ifi7x]| < 32 



(40) 


x = 


2048 


A 
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where (n, m) = PoW(_Hh; H n \ d) 

Noting additionally that extraData must be at most 
32 bytes. 


5. Gas and Payment 

In order to avoid issues of network abuse and to side- 
step the inevitable questions stemming from Turing com- 
pleteness, all programmable computation in Ethereum is 
subject to fees. The fee schedule is specified in units of 
gas (see Appendix :G: for the fees associated with var- 
ious computation). Thus any given fragment of pro- 
grammable computation (this includes creating contracts, 
making message calls, utilising and accessing account stor- 
age and executing operations on the virtual machine) has 
a universally agreed cost in terms of gas. 

Every transaction has a specific amount of gas associ- 
ated with it: gasLimit. This is the amount of gas which 
is implicitly purchased from the sender’s account balance. 
The purchase happens at the according gasPrice, also 
specified in the transaction. The transaction is considered 
invalid if the account balance cannot support such a pur- 
chase. It is named gasLimit since any unused gas at the 
end of the transaction is refunded (at the same rate of pur- 
chase) to the sender’s account. Gas does not exist outside 
of the execution of a transaction. Thus for accounts with 
trusted code associated, a relatively high gas limit may be 
set and left alone. 

In general, Ether used to purchase gas that is not re- 
funded is delivered to the beneficiary address, the address 
of an account typically under the control of the miner. 
Transactors are free to specify any gasPrice that they 
wish, however miners are free to ignore transactions as 
they choose. A higher gas price on a transaction will there- 
fore cost the sender more in terms of Ether and deliver a 
greater value to the miner and thus will more likely be 
selected for inclusion by more miners. Miners, in general, 
will choose to advertise the minimum gas price for which 
they will execute transactions and transactors will be free 
to canvas these prices in determining what gas price to 
offer. Since there will be a (weighted) distribution of min- 
imum acceptable gas prices, transactors will necessarily 
have a trade-off to make between lowering the gas price 
and maximising the chance that their transaction will be 
mined in a timely manner. 


6. Transaction Execution 

The execution of a transaction is the most complex part 
of the Ethereum protocol: it defines the state transition 
function ;T: It is assumed that any transactions executed 
first pass the initial tests of intrinsic validity. These in- 
clude: 

(1) The transaction is well-formed RLP, with no ad- 
ditional trailing bytes; 

(2) the transaction signature is valid; 

(3) the .transaction nonce; is valid (equivalent to the 
: sender account’s current nonce); 

(4) the gas limit is no smaller than the intrinsic gas, 
go, used by the transaction; 

(5) the sender account balance contains at least the 
cost, vo, required in up-front payment. 


Formally, we consider the function ;T; with T being a 
transaction and cr the state: 

(57) <t' = T (a,T) 

Thus cr' is the post-transactional state. We also define 
:T 9 :to evaluate to the amount of gas used in the execution 
of a transaction and T 1 ; to evaluate to the transaction’s 
accrued log items, both to be formally defined later. 


6.1. Substate. Throughout transaction execution, we 
accrue certain information that is acted upon immediately 
following the transaction. We call this transaction sub- 
state, and represent it as A, which is a tuple: 


(58) A = (A s ,A h A r ) 


The tuple contents include A s , the self-destruct set: a 
set of accounts that will be discarded following the trans- 
action’s completion. A\ is the log series: this is a series 
of archived and indexable ‘checkpoints’ in VM code exe- 
cution that allow for contract-calls to be easily tracked by 
onlookers external to the Ethereum world (such as decen- 
tralised application front-ends). Finally there is A r , the 
refund balance, increased through using the SSTORE in- 
struction in order to reset contract storage to zero from 
some non-zero value. Though not immediately refunded, 
it is allowed to partially offset the total execution costs. 

For brevity, we define the empty substate A 0 to have 
no self-destructs, no logs and a zero refund balance: 

(59) A 0 = (0,(),O) 

6.2. Execution. We define intrinsic gas go, the amount 
of gas this transaction requires to be paid prior to execu- 
tion, as follows: 


(60) 

(61) 

(62) 


go 


E 

i£Ti,T d 


I Gtxdataze : 
I Gtxdatano 


if i = 0 
otherwise 


+ 


'- T txcreate 

o 


if T t = 0 
otherwise 


+ G 


transaction 


where 7) , Td means the series of bytes of the trans- 
action’s associated data and initialisation EVM-code, 
depending on whether the transaction is for contract- 
creation or message-call. G txC reate is added if the transac- 
tion is contract-creating, but not if a result of EVM-code. 
G is fully defined in Appendix ;G; 

The up-front cost Vo is calculated as: 

(63) vo = T S T P -H T v ; 

The validity is determined as: 


S(T) 


0 A 

a[S(T)] 

7^ 

0 A 

T n 

= 

<r[S(T)] n A 

go 


T s A 

vo 


<x[S(T)] b A 

T g 


Bhi —■ £(B R ) U 


Note the final condition; the sum of the transaction’s 
gas limit, T g , and the gas utilised in this block prior, given 
by7(5R.) u , must be no greater than the block’s gasLimit, 
Bh\- 

The execution of a valid transaction begins with an 
irrevocable change made to the state: themonce of the ac-: 
icount of the sender ( S(T), is incremented by one and the 
balance is reduced by part of the up-front cost, T S T P . The 
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gas available for the proceeding computation, g, is defined 
as T s — go- The computation, whether contract creation 
or a message call, results in an eventual state (which may 
legally be equivalent to the current state), the change to 
which is deterministic and never invalid: there can be no 
invalid transactions from this point. 

We define the checkpoint state er 0 : 

(65) <To = cr except: 

(66) «To[S(T)]i = <r[S(T)\ b -T g T p 

(67) <ro[S(T)] n = cr[5(T)] n + 1 


Evaluating crp from cr 0 depends on the transaction 
type; either contract creation or message call; we define 
the tuple of post-execution provisional state crp, remain- 
ing gas g' and substate A: 

(68) 


(ap,g',A) 


' A(<to, S(T), To, 

g, T p , T v) T i; 0) if T t = 0 

' 03(<To,S(''T),r o , 

T t ,T t ,g, T p ; T v ; T v > T d , 0) otherwise 


where g is the amount of gas remaining after deducting 
the basic amount required to pay for the existence of the 
transaction: 


(69) g = T s -g 0 

and T 0 is the original transactor, which can differ from the 
sender in the case of a message call or contract creation 
not directly triggered by a transaction but coming from 
the execution of EVM-code. 

Note we use 03 to denote the fact that only the first 
three components of the function’s value are taken; the 
final represents the message-call’s output value (a byte 
array) and is unused in the context of transaction evalua- 
tion. 

After the message call or contract creation is processed, 
the state is finalised by determining the amount to be re- 
funded, g* from the remaining gas, g ' , plus some allowance 
from the refund counter, to the sender at the original rate. 

(70) g* = g +min{^ rg - 9 j,A r } 


And finally, we specify Y 9 , the total gas used in this 
transaction and Y , the logs created by this transaction: 

(77) Y 9 (er, T) = T g - g' 

(78) Y‘(c7, T) = A\ 

These are used to help define the transaction receipt; 

7. Contract Creation 

There are a number of intrinsic parameters used when 
creating an account: sender (s), original transactor (o), 
available gas (g), gas price (p), endowment ( v ) together 
with an arbitrary length byte array, i, the initialisation 
EVM code and finally the present depth of the message- 
call/contract-creation stack (e). 

We define the creation function formally as the func- 
tion A, which evaluates from these values, together with 
the state cr to the tuple containing the new state, remain- 
ing gas and accrued transaction substate ( a',g',A ), as in 
section ;6i 

(79) (a',g',A) = A(<r,s,o,g,p,v,i,e) 

The address of the new account is defined as being the 
rightmost 160 bits of the Keccak hash of the RLP encod- 
ing of the structure containing only the sender and the 
inoncei Thus we define the resultant address for the new 
account a: 

(80) a = S 9 6 .. 25 s(kEC^RLP( (s,<r[s]„ - 1) ))) 

where KEC is the Keccak 256-bit hash function, RLP is 
the RLP encoding function, B a ..b(X) evaluates to binary 
value containing the bits of indices in the range [a, b] of 
the binary data X and cr[x\ is the address state of a: or 0 if 
none exists. Note we use one fewer than theisender’s nonce: 
: value; we assert that we have incremented the sender ac- 
count’s nonce prior to this call, and so the value used 
is the sender’s nonce at the beginning of the responsible 
transaction or VM operation. 

The account’s nonce is initially defined as zero, the 
balance as the value passed, the storage as empty and the 
code hash as the Keccak 256-bit hash of the empty string; 
the sender’s balance is also reduced by the value passed. 
Thus the mutated state becomes cr*: 

(81) cr* = cr except: 


The total refundable amount is the legitimately remain- 
ing gas g ' , added to A r , with the latter component being 
capped up to a maximum of half (rounded down) of the 
total amount used T s — g 1 . 

The Ether for the gas is given to the miner, whose ad- 
dress is specified as the beneficiary of the present block 
B. So we define the pre- final state cr* in terms of the 


provisional state crp: 


(71) 

* 

a = 

cr p except 

(72) 

cr*[S(T)] b ee 

<rp[S(T)} b + g*T p 

(73) 

fT*[m] b = 

(T P [m \ b + (T g - g*)T p 

(74) 

m = 

Bh c 


The final state, cr', is reached after deleting all accounts 
that appear in the self-destruct set: 

cr' = cr* except 
\/i £ A s \ <r'[i ] = 0 


(82) cr*[a] = (0, v + v , TRIE(0), KEC(())) 

(83) cr* [s] b = cr[s] b — v 

where v' is the account’s pre-existing value, in the event 
it was previously in existence: 

0 if cr[o] = 0 
cr[a]b otherwise 

Finally, the account is initialised through the execution 
of the initialising EVM code i according to the execution 
model (see section ;9). Code execution can effect several 
events that are not internal to the execution state: the 
account’s storage can be altered, further accounts can be 
created and further message calls can be made. As such, 
the code execution function E evaluates to a tuple of the 
resultant state cr**, available gas remaining g** , the ac- 
crued substate A and the body code of the account o. 

(85) 


(75) 

(76) 


(o-** )S r**,A,o) = E(cr* ,g, I) 
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where I contains the parameters of the execution environ- 
ment as defined in section : 9; that is: 

(86) I a = a 

(87) la = o 

(88) I p = p 

(89) I d = () 

(90) I s = s 

(91) :/ y : = v 

(92) It = i 

(93) I e = e 

I d evaluates to the empty tuple as there is no input 
data to this call. Ih has no special treatment and is de- 
termined from the blockchain. 

Code execution depletes gas, and gas may not go below 
zero, thus execution may exit before the code has come 
to a natural halting state. In this (and several other) 
exceptional cases we say an out-of-gas (OOG) exception 
has occurred: The evaluated state is defined as being the 
empty set, 0, and the entire create operation should have 
no effect on the state, effectively leaving it as it was im- 
mediately prior to attempting the creation. 

If the initialization code completes successfully, a fi- 
nal contract-creation cost is paid, the code-deposit cost, 
c, proportional to the size of the created contract’s code: 

(94) C = G codedeposit X O 

If there is not enough gas remaining to pay this, i.e. 
g** < c, then we also declare an out-of-gas exception. 

The gas remaining will be zero in any such exceptional 
condition, i.e. if the creation was conducted as the recep- 
tion of a transaction, then this doesn’t affect payment of 
the intrinsic cost of contract creation; it is paid regardless. 
However, the value of the transaction is not transferred to 
the aborted contract’s address when we are out-of-gas. 

If such an exception does not occur, then the remain- 
ing gas is refunded to the originator and the now-altered 
state is allowed to persist. Thus formally, we may specify 
the resultant state, gas and substate as (cr',g',A) where: 


(95) g' 

(96) cr' 

where 

(97) F 



if F 

* — c otherwise 

if F 

** except: 

o-'[a]c = KEC(o) otherwise 
'=0 V g“<c V |o|> 24576) 


The exception in the determination of cr' dictates that 
o, the resultant byte sequence from the execution of the 
initialisation code, specifies the final body code for the 
newly-created account. Note that the 24576 byte limit 
for o exists because a contract creation call can trigger 
O(n) cost in terms of reading the code from disk, prepro- 
cessing the code for VM execution, and also adding 0(n) 
data to the Merkle proof for the block’s proof-of-validity. 
With higher gas limits that can be caused by dynamic gas 
limit rules, this is a greater concern, and is especially in- 
convenient with light clients verifying proofs of validity or 
invalidity. 


Note that the intention is that the result is either a 
successfully created new contract with its endowment, or 
no new contract with no transfer of value. 

7.1. Subtleties. Note that while the initialisation code 
is executing, the newly created address exists but with 
no intrinsic body code. Thus any message call received 
by it during this time causes no code to be executed. If 
the initialisation execution ends with a (SELFDESTRUCT: 
instruction, the matter is moot since the account will be 
deleted before the transaction is completed. For a normal 
STOP code, or if the code returned is otherwise empty, 
then the state is left with a zombie account, and any re- 
maining balance will be locked into the account forever. 

8. Message Call 

In the case of executing a message call, several param- 
eters are required: sender (s), transaction originator (o), 
recipient (r), the account whose code is to be executed (c, 
usually the same as recipient), available gas (g), value (u) 
and gas price ( p ) together with an arbitrary length byte 
array, d, the input data of the call and finally the present 
depth of the message-call/contract-creation stack (e). 

Aside from evaluating to a new state and transaction 
substate, message calls also have an extra component — the 
output data denoted by the byte array o. This is ignored 
when executing transactions, however message calls can 
be initiated due to VM-code execution and in this case 
this information is used. 

(98) (cr', g , A, o) = ©(cr, s, o, r, c, g,p, v, v, d. e) 

Note that we need to differentiate between the value that 
is to be transferred, v, from the value apparent in the ex- 
ecution context, v, for the DELEGATECALL instruction. 

We define <xi, the first transitional state as the orig- 
inal state but with the value transferred from sender to 
recipient: 

(99) (Ti[r] b = fr[r] b + v A cri[s] b = cr[s] b - v 
unless s = r. 

Throughout the present work, it is assumed that if 
<xi[r] was originally undefined, it will be created as an ac- 
count with no code or state and zero balance and monce: 
Thus the previous equation should be taken to mean: 

cr i = a[ except: 

0-1 [s]b = o-l [s]b -V 

and ar'i = cr except: 

a[[r] = (v,O,KEC(()),TRIE(0)) if cr[r] = 0 
a[ [r] b = cr[r]b + v otherwise 

The account’s associated code (identified as the frag- 
ment whose Keccak hash is cr[c] c ) is executed according to 
the execution model (see section ;9). Just as with contract 
creation, if the execution halts in an exceptional fashion 
(i.e. due to an exhausted gas supply, stack underflow, in- 
valid jump destination or invalid instruction), then no gas 
is refunded to the caller and the state is reverted to the 
point immediately prior to balance transfer (i.e. cr). 


(100) 

( 101 ) 

(102) 

(103) 
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(104) o' 

(105) g' 


(106) (o** ,g** , A, o) 


(107) 

/a 

(108) 

la 

(109) 


(110) 

Id 

(111) 

Is 

(112) 

/v 

(113) 

h 

(114) 

Let KEC(/b) 


jo if o** — 0 

jo** otherwise 

JO if o**=0 

1 g** otherwise 

H E cREc(<Ti,g,I) if r = 1 

Hsha256(o"i, g, I) if r = 2 
< SRipi6o(fTi,5i,/) if r = 3 
Si D (o" i, g, I) if r = 4 
E(<ti, g, I) otherwise 


r 

o 


P 

d 



It is assumed that the client will have stored the pair 
(KEC(/b), /b) at some point prior in order to make the de- 
termination of lb feasible. 

As can be seen, there are four exceptions to the usage 
of the general execution framework 2 for evaluation of the 
message call: these are four so-called ‘precompiled’ con- 
tracts, meant as a preliminary piece of architecture that 
may later become native extensions. The four contracts 
in addresses 1, 2, 3 and 4 execute the elliptic curve public 
key recovery function, the SHA2 256-bit hash scheme, the 
RIPEMD 160-bit hash scheme and the identity function 
respectively. 

Their full formal definition is in Appendix 


9. Execution Model 

The execution model specifies how the system state is 
altered given a series of bytecode instructions and a small 
tuple of environmental data. This is specified through a 
formal model of a virtual state machine, known as the 
Ethereum Virtual Machine (EVM). It is a gwasi-Turing- 
complete machine; the quasi qualification comes from the 
fact that the computation is intrinsically bounded through 
a parameter, gas, which limits the total amount of com- 
putation done. 

9.1. Basics. The EVM is a simple stack-based architec- 
ture. The word size of the machine (and thus size of stack 
item) is 256-bit. This was chosen to facilitate the Keccak- 
256 hash scheme and elliptic-curve computations. The 
memory model is a simple word- addressed byte array. The 
stack has a maximum size of 1024. The machine also has 
an independent storage model; this is similar in concept 
to the memory but rather than a byte array, it is a word- 
addressable word array. Unlike memory, which is volatile, 
storage is non volatile and is maintained as part of the 
system state. All locations in both storage and memory 
are well-defined initially as zero. 

The machine does not follow the standard von Neu- 
mann architecture. Rather than storing program code in 


generally-accessible memory or storage, it is stored sepa- 
rately in a virtual ROM interactable only through a spe- 
cialised instruction. 

The machine can have exceptional execution for several 
reasons, including stack underflows and invalid instruc- 
tions. Like the out-of-gas exception, they do not leave 
state changes intact. Rather, the machine halts immedi- 
ately and reports the issue to the execution agent (either 
the transaction processor or, recursively, the spawning ex- 
ecution environment) which will deal with it separately. 

9.2. Fees Overview. Fees (denominated in gas) are 
charged under three distinct circumstances, all three as 
prerequisite to the execution of an operation. The first 
and most common is the fee intrinsic to the computation 
of the operation (see Appendix :G). Secondly, gas may be 
deducted in order to form the payment for a subordinate 
message call or contract creation; this forms part of the 
payment for CREATE, CALL and CALLCODE. Finally, gas 
may be paid due to an increase in the usage of the memory. 

Over an account’s execution, the total fee for memory- 
usage payable is proportional to the smallest multiple of 
32 bytes that is required such that all memory indices 
(whether for read or write) are included in the range. This 
is paid for on a just-in-time basis; as such, referencing an 
area of memory at least 32 bytes greater than any previ- 
ously indexed memory will certainly result in an additional 
memory usage fee. Due to this fee it is highly unlikely 
addresses will ever go above 32-bit bounds. That said, 
implementations must be able to manage this eventuality. 

Storage fees have a slightly nuanced behaviour — to in- 
centivise minimisation of the use of storage (which corre- 
sponds directly to a larger state database on all nodes), 
the execution fee for an operation that clears an entry in 
the storage is not only waived, a qualified refund is given; 
in fact, this refund is effectively paid up-front since the 
initial usage of a storage location costs substantially more 
than normal usage. 

See Appendix ;H: for a rigorous definition of the EVM 
gas cost. 

9.3. Execution Environment. In addition to the sys- 
tem state o, and the remaining gas for computation g, 
there are several pieces of important information used in 
the execution environment that the execution agent must 
provide; these are contained in the tuple I : 

• J a , the address of the account which owns the 
code that is executing. 

• J 0 , the sender address of the transaction that orig- 
inated this execution. 

• I p , the price of gas in the transaction that origi- 
nated this execution. 

• Id, the byte array that is the input data to this 
execution; if the execution agent is a transaction, 
this would be the transaction data. 

• J s , the address of the account which caused the 
code to be executing; if the execution agent is a 
transaction, this would be the transaction sender. 

• J v , the value, in Wei, passed to this account as 
part of the same procedure as execution; if the 
execution agent is a transaction, this would be 
the transaction value. 

• lb, the byte array that is the machine code to be 
executed. 
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• Ih, the block header of the present block. 

• I e , the depth of the present message-call or 
contract-creation (i.e. the number of CALLs or 
CREATES being executed at present). 

The execution model defines the function E, which can 
compute the resultant state or' , the remaining gas g' , the 
accrued substate A and the resultant output, o, given 
these definitions. For the present context, we will defined 
it as: 

(115) (a 1 , g' , A, o) = E(rr , g, I) 

where we will remember that A, the accrued substate 
is defined as the tuple of the suicides set s, the log series 
1 and the refunds r : 

(116) A = (s, 1, r) 

9.4. Execution Overview. We must now define the E 
function. In most practical implementations this will be 
modelled as an iterative progression of the pair comprising 
the full system state, cr and the machine state, p,. For- 
mally, we define it recursively with a function A'. This 
uses an iterator function O (which defines the result of a 
single cycle of the state machine) together with functions 
Z which determines if the present state is an exceptional 
halting state of the machine and H, specifying the output 
data of the instruction if and only if the present state is a 
normal halting state of the machine. 

The empty sequence, denoted (), is not equal to the 
empty set, denoted 0; this is important when interpreting 
the output of H, which evaluates to 0 when execution is to 
continue but a series (potentially empty) when execution 
should halt. 


(117) 

H(o-,g,l) = (<r',/Xg, A,o) 

(118) 

(<t,m', A, ...,o) = X((cr,p,A°,I)) 

(119) 

Mg = S 

(120) 

M pc = 0 

(121) 

Mm = (0,0,...) 

(122) 

Mi = 0 

(123) 

M s = 0 

(124) 

f(0,M,A o ,/,()) if Z(a,p,I) 

X ({cr , p, A, I)) = < 0(cr,p, A, I) • 0 if o/0 


[ X(0(cr, p, A, 7)) otherwise 

where 


(125) 

0 = H(p,I) 

(126) 

( a,b,c,d)-e = (a,b,c,d,e) 


Note that, when we evaluate S, we drop the fourth 
element /' and extract the remaining gas p' s from the re- 
sultant machine state p! . 

X is thus cycled (recursively here, but implementations 
are generally expected to use a simple iterative loop) until 
either Z becomes true indicating that the present state is 
exceptional and that the machine must be halted and any 
changes discarded or until H becomes a series (rather than 
the empty set) indicating that the machine has reached a 
controlled halt. 


9.4.1. Machine State. The machine state p is defined as 
the tuple (g,pc, m, i, s) which are the gas available, the 
program counter pc £ P 256 , the memory contents, the ac- 
tive number of words in memory (counting continuously 
from position 0), and the stack contents. The memory 
contents p m are a series of zeroes of size 2 256 . 

For the ease of reading, the instruction mnemonics, 
written in small-caps (e.g. ADD), should be interpreted 
as their numeric equivalents; the full table of instructions 
and their specifics is given in Appendix Hi 

For the purposes of defining Z, H and O, we define w 
as the current operation to be executed: 

J b[/vl if Mpc < pb|| 

STOP otherwise 

We also assume the fixed amounts of 5 and a, specify- 
ing the stack items removed and added, both subscript- 
able on the instruction and an instruction cost function C 
evaluating to the full cost, in gas, of executing the given 
instruction. 

9.4.2. Exceptional Halting. The exceptional halting func- 
tion Z is defined as: 

(128) Z (cr, ( 1 , 1) = p s <C(a,p,I) V 

5 W =0 V 

I Ms 1 1 ^ V 

( w £ {JUMP, JUMPI} A 
M s [0] ^ D{Ib)) V 
II Ms II — Sw + otw > 1024 

This states that the execution is in an exceptional halt- 
ing state if there is insufficient gas, if the instruction is in- 
valid (and therefore its 5 subscript is undefined), if there 
are insufficient stack items, if a JUMP/ JUMPI destination 
is invalid or the new stack size would be larger then 1024. 
The astute reader will realise that this implies that no in- 
struction can, through its execution, cause an exceptional 
halt. 

9.4.3. Jump Destination Validity. We previously used D 
as the function to determine the set of valid jump desti- 
nations given the code that is being run. We define this 
as any position in the code occupied by a JUMPDEST in- 
struction. 

All such positions must be on valid instruction bound- 
aries, rather than sitting in the data portion of PUSH 
operations and must appear within the explicitly defined 
portion of the code (rather than in the implicitly defined 
STOP operations that trail it). 

Formally: 

(129) D(c) = Dj(c, 0) 


where: 


(130) 


f {} 

if i ^ |c| 

Dj{ c, i) = < {*} U Dj( c, N(i, c[i])) 

if c[i] = JUMPDEST 

[l)j(c, N(i,c[i])) 

otherwise 


where N is the next valid instruction position in the 
code, skipping the data of a PUSH instruction, if any: 
(131) 

f i + w - PUSH1 + 2 if w £ [PUSHl,PUSH32l 
N(i, w) = < 

I * + 1 otherwise 
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9.4.4. Normal Halting. The normal halting function H is 
defined: 

(132) 

{ Hretuhn(/i) if w — RETURN 

() if w € STOP,;SELFDE$TRUCT: 

0 otherwise 

The data-returning halt operation, RETURN; has a 
special function T/return- Note also the difference be- 
tween the empty sequence and the empty set as discussed 
There: 


9.5. The Execution Cycle. Stack items are added or 
removed from the left-most, lower-indexed portion of the 
series; all other items remain unchanged: 


(133) 

0((er,p, A,/)) = 

{a' , p , A 1 , 1) 

(134) 

A = 

Oiw dw 

(135) 

11/411 = 

II mJ + A 

(136) 

Vz £ [a w , 11/411) : n' a [x] = 

Ms [ x ~ A] 


The gas is reduced by the instruction’s gas cost and 
for most instructions, the program counter increments on 
each cycle, for the three exceptions, we assume a function 
J, subscripted by one of two instructions, which evaluates 
to the according value: 

(137) fi' g = fi g -C( 

{ Jjump(m) if w = JUMP 
>/jumpi(m) if w = JUMPI 
N(fj, pc , w ) otherwise 

In general, we assume the memory, self-destruct set and 
system state don’t change: 


(139) 

m4 

= Mm 

(140) 

f 

Mi 

= M; 

(141) 

A' 

= A 

(142) 

/ 

a 

= cr 


However, instructions do typically alter one or several 
components of these values. Altered components listed by 
instruction are noted in Appendix H) alongside values for 
a and 5 and a formal description of the gas requirements. 

10. Blocktree to Blockchain 

The canonical blockchain is a path from root to leaf 
through the entire block tree. In order to have consensus 
over which path it is, conceptually we identify the path 
that has had the most computation done upon it, or, the 
heaviest path. Clearly one factor that helps determine the 
heaviest path is the block number of the leaf, equivalent 
to the number of blocks, not counting the unmined genesis 
block, in the path. The longer the path, the greater the 
total mining effort that must have been done in order to 
arrive at the leaf. This is akin to existing schemes, such 
as that employed in Bitcoin-derived protocols. 

Since a block header includes the difficulty, the header 
alone is enough to validate the computation done. Any 
block contributes toward the total computation or total 
difficulty of a chain. 


Thus we define the total difficulty of block B recur- 
sively as: 

(143) B t = B' t +B d 

(144) B' = P(B h ) 

As such given a block B, Bt is its total difficulty, B' is 
its parent block and B d is its difficulty. 

11. Block Finalisation 

The process of finalising a block involves four stages: 

(1) Validate (or, if mining, determine) ommers; 

(2) validate (or, if mining, determine) transactions; 

(3) apply rewards; 

(4) verify (or, if mining, compute a valid) state and 
[nonce.: 

11.1. Ommer Validation. The validation of ommer 
iheaders: means nothing more than verifying that each om- 
mer header is both a valid header and satisfies the rela- 
tion of Vth-generation ommer to the present block where 
N < 6. The maximum of ommer headers is two. Formally: 

(145) \\BuW H 2 f\ \ v ( u )\ A k(U,P(B H ) H , 6) 

: : cc«u : 

where k denotes the “is-kin” property: 

(146) 

{ false if n = 0 

s(U, H) 

V k{U, P(H)h , n — 1) otherwise 

and s denotes the “is-sibling” property: 

(147) 

s(U, H ) = (P(H) = P(U) A H / U A U $ B(H)u) 
where B(H) is the block of the corresponding header H. 

11.2. Transaction Validation. The given gasUsed 
must correspond faithfully to the transactions listed: 
\Bh s ', the total gas used in the block, must be equal to 
the accumulated gas used according to the final transac- 
tion: 

(148) 74 « -it: Him 

11.3. Reward Application. The application of rewards 
to a block involves raising the balance of the accounts of 
the beneficiary address of the block and each ommer by a 
certain amount. We raise the block’s beneficiary account 
by:'R b ; for each ommer, we raise the block’s beneficiary by 
an additional A of the block reward and the beneficiary of 
the ommer gets rewarded depending on the block number. 
Formally we define the function Q, the block- finalisation 
state transition function (a function that rewards a nom- 


inated party): 

(149) Q(B,cr) = a' : a 1 = a except: 

(150 W[BH 9 } b = *[B Hc } b + (l+^)R h : 
(151)Vi/ eSu : 

a'[U c ] h = a[U c } b + {l+l(Ui-.B m )jR b 

O 1 ‘ ■ 


If there are collisions of the beneficiary addresses be- 
tween ommers and the block (i.e. two ommers with the 
same beneficiary address or an ommer with the same bene- 
ficiary address as the present block), additions are applied 
cumulatively. 
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We define the block reward as 5 Ether: 
(152) Let fi b = 5x 10 18 


11.4. State & Nonce Validation. We may now define 
the function, F, that maps a block B to its initiation state: 
(153) 

r (B) U if p { b h) = z 

(^CTi : TRIE(Ls(o- ; )) = P(Bh)h v otherwise 

Here, TRIE(Ls(eri)) means the hash of the root node of 
a trie of state <j x \ it is assumed that implementations will 
store this in the state database, trivial and efficient since 
the trie is by nature an immutable data structure. And 
finally we define <E>, the block transition function, which 
maps an incomplete block B to a complete block B'\ 


(154) $(B) 

= b' 

: B' = B* 

except: 




o256 


(155) 

B’n 

= n : 

/A 


(156) 

B' m 

= m 

with (x, m) 

= PoW (%, n, d) 

(157) 

B * 

= B 

except: B * 

= r(U(r(B),B)) 


With d being a dataset as specified in appendix : J; 

As specified at the beginning of the present work, n is 
the state-transition function, which is defined in terms of 
fl, the block finalisation function and;T; the transaction- 
evaluation function, both now well-defined. 

;As previously detailed; R[n] CT , R[n]i and R[n] u are the 
nth corresponding states, logs and cumulative gas used af- 
ter each transaction (R[n]b, the fourth component in the 
tuple, has already been defined in terms of the logs). The 
former is defined simply as the state resulting from apply- 
ing the corresponding transaction to the state resulting 
from the previous transaction (or the block’s initial state 
in the case of the first such transaction): 


(158) R[n] ff 


F (B) if n < 0 

;T(R[n — l]n-, Rx[n]) otherwise 


In the case of -Bii[n] u , we take a similar approach defin- 
ing each item as the gas used in evaluating the correspond- 
ing transaction summed with the previous item (or zero, 
if it is the first), giving us a running total: 


f 0 if n < 0 

(159) R[n] u = < :T 9 (R[n — 1]„BtH) 

[ +R[n — l] u otherwise 

For R[n]i, we utilise the :T*! function that we conve- 
niently defined in the transaction execution function. 

(160) R[n]i =j T'(R[n — l]„,B T [n]) 

Finally, we dehne n as the new state given the block 
reward function Q. applied to the final transaction’s resul- 
tant state, ^(Rr)^: 

(161) Hot. B) :<>!/>'. /(:Ri„) 

Thus the complete block-transition mechanism is de- 
fined, except for PoW, the proof-of-work function. 


11.5. Mining Proof-of-Work. The mining proof-of- 
work (PoW) exists as a cryptographically secure nonce 
that proves beyond reasonable doubt that a particular 
amount of computation has been expended in the deter- 
mination of some token value n. It is utilised to enforce 
the blockchain security by giving meaning and credence 


to the notion of difficulty (and, by extension, total dif- 
ficulty). However, since mining new blocks comes with 
an attached reward, the proof-of-work not only functions 
as a method of securing confidence that the blockchain 
will remain canonical into the future, but also as a wealth 
distribution mechanism. 

For both reasons, there are two important goals of the 
proof-of-work function; firstly, it should be as accessible as 
possible to as many people as possible. The requirement 
of, or reward from, specialised and uncommon hardware 
should be minimised. This makes the distribution model 
as open as possible, and, ideally, makes the act of mining a 
simple swap from electricity to Ether at roughly the same 
rate for anyone around the world. 

Secondly, it should not be possible to make super-linear 
profits, and especially not so with a high initial barrier. 
Such a mechanism allows a well-funded adversary to gain 
a troublesome amount of the network’s total mining power 
and as such gives them a super-linear reward (thus skew- 
ing distribution in their favour) as well as reducing the 
network security. 

One plague of the Bitcoin world is ASICs. These are 
specialised pieces of compute hardware that exist only to 
do a single task. In Bitcoin’s case the task is the SHA256 
hash function. While ASICs exist for a proof-of-work func- 
tion, both goals are placed in jeopardy. Because of this, 
a proof-of-work function that is ASIC-resistant (i.e. diffi- 
cult or economically inefficient to implement in specialised 
compute hardware) has been identified as the proverbial 
silver bullet. 

Two directions exist for ASIC resistance; firstly make 
it sequential memory-hard, i.e. engineer the function such 
that the determination of the ;nonce: requires a lot of mem- 
ory and bandwidth such that the memory cannot be used 
in parallel to discover multiple nonces simultaneously. The 
second is to make the type of computation it would need 
to do general-purpose; the meaning of “specialised hard- 
ware” for a general-purpose task set is, naturally, general 
purpose hardware and as such commodity desktop com- 
puters are likely to be pretty close to “specialised hard- 
ware” for the task. For Ethereum 1.0 we have chosen the 
first path. 

More formally, the proof-of-work function takes the 
form of PoW: 


(162) 

m = H m A 


2256 



with (m, n) 


PoW (H h;, H n , d )m = H m 


A 


Where Hh: is the new block’s header but without the 
nonce and mix-hash components; H n \ is the nonce of the 
block header; d is a large data set needed to compute 
the mixHash and Ha. is the new block’s difficulty value 
(i.e. the block difficulty from section 10). PoW is the 
proof-of-work function which evaluates to an array with 
the first item being the mixHash and the second item be- 
ing a pseudo-random number cryptographically dependent 
on H and d. The underlying algorithm is called Ethash 
and is described below. 


11.5.1. Ethash. Ethash is the PoW algorithm for 
Ethereum 1.0. It is the latest version of Dagger- 
Hashimoto, introduced by Buterin [2013b] and Dryja 
[2014], although it can no longer appropriately be called 
that since many of the original features of both algorithms 
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have been drastically changed in the last month of research 
and development. The general route that the algorithm 
takes is as follows: 

There exists a seed which can be computed for each 
block by scanning through the block headers up until that 
point. From the seed, one can compute a pseudorandom 
cache, Jcachein.it bytes in initial size. Light clients store 
the cache. From the cache, we can generate a dataset, 
Jdatasetin.it bytes in initial size, with the property that 
each item in the dataset depends on only a small number 
of items from the cache. Full clients and miners store the 
dataset. The dataset grows linearly with time. 

Mining involves grabbing random slices of the dataset 
and hashing them together. Verification can be done with 
low memory by using the cache to regenerate the specific 
pieces of the dataset that you need, so you only need to 
store the cache. The large dataset is updated once ev- 
ery J epoch blocks, so the vast majority of a miner’s effort 
will be reading the dataset, not making changes to it. The 
mentioned parameters as well as the algorithm is explained 
in detail in appendix : J; 

12. Implementing Contracts 

There are several patterns of contracts engineering that 
allow particular useful behaviours; two of these that I will 
briefly discuss are data feeds and random numbers. 

12.1. Data Feeds. A data feed contract is one which pro- 
vides a single service: it gives access to information from 
the external world within Ethereum. The accuracy and 
timeliness of this information is not guaranteed and it is 
the task of a secondary contract author — the contract that 
utilises the data feed — to determine how much trust can 
be placed in any single data feed. 

The general pattern involves a single contract within 
Ethereum which, when given a message call, replies with 
some timely information concerning an external phenom- 
enon. An example might be the local temperature of 
New York City. This would be implemented as a contract 
that returned that value of some known point in storage. 
Of course this point in storage must be maintained with 
the correct such temperature, and thus the second part 
of the pattern would be for an external server to run an 
Ethereum node, and immediately on discovery of a new 
block, creates a new valid transaction, sent to the contract, 
updating said value in storage. The contract’s code would 
accept such updates only from the identity contained on 
said server. 

12.2. Random Numbers. Providing random numbers 
within a deterministic system is, naturally, an impossible 
task. However, we can approximate with pseudo-random 
numbers by utilising data which is generally unknowable 
at the time of transacting. Such data might include the 
block’s hash, the block’s timestamp and the block’s benefi- 
ciary address. In order to make it hard for malicious miner 
to control those values, one should use the BLOCKHASH 
operation in order to use hashes of the previous 256 blocks 
as pseudo-random numbers. For a series of such numbers, 
a trivial solution would be to add some constant amount 
and hashing the result. 


13. Future Directions 

The state database won’t be forced to maintain all past 
state trie structures into the future. It should maintain an 
age for each node and eventually discard nodes that are 
neither recent enough nor checkpoints; checkpoints, or a 
set of nodes in the database that allow a particular block’s 
state trie to be traversed, could be used to place a maxi- 
mum limit on the amount of computation needed in order 
to retrieve any state throughout the blockchain. 

Blockchain consolidation could be used in order to re- 
duce the amount of blocks a client would need to download 
to act as a full, mining, node. A compressed archive of the 
trie structure at given points in time (perhaps one in every 
10,000th block) could be maintained by the peer network, 
effectively recasting the genesis block. This would reduce 
the amount to be downloaded to a single archive plus a 
hard maximum limit of blocks. 

Finally, blockchain compression could perhaps be con- 
ducted: nodes in state trie that haven’t sent/received a 
transaction in some constant amount of blocks could be 
thrown out, reducing both Ether-leakage and the growth 
of the state database. 

13.1. Scalability. Scalability remains an eternal con- 
cern. With a generalised state transition function, it be- 
comes difficult to partition and parallelise transactions 
to apply the divide-and-conquer strategy. Unaddressed, 
the dynamic value-range of the system remains essentially 
fixed and as the average transaction value increases, the 
less valuable of them become ignored, being economically 
pointless to include in the main ledger. However, several 
strategies exist that may potentially be exploited to pro- 
vide a considerably more scalable protocol. 

Some form of hierarchical structure, achieved by either 
consolidating smaller lighter-weight chains into the main 
block or building the main block through the incremen- 
tal combination and adhesion (through proof-of-work) of 
smaller transaction sets may allow parallelisation of trans- 
action combination and block-building. Parallelism could 
also come from a prioritised set of parallel blockchains, 
consolidating each block and with duplicate or invalid 
transactions thrown out accordingly. 

Finally, verifiable computation, if made generally avail- 
able and efficient enough, may provide a route to allow the 
proof-of-work to be the verification of final state. 

14. Conclusion 

I have introduced, discussed and formally defined the 
protocol of Ethereum. Through this protocol the reader 
may implement a node on the Ethereum network and join 
others in a decentralised secure social operating system. 
Contracts may be authored in order to algorithmically 
specify and autonomously enforce rules of interaction. 
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Appendix A. Terminology 

External Actor: A person or other entity able to interface to an Ethereum node, but external to the world of 
Ethereum. It can interact with Ethereum through depositing signed Transactions and inspecting the blockchain 
and associated state. Has one (or more) intrinsic Accounts. 

Address: A 160-bit code used for identifying Accounts. 

Account: Accounts have an intrinsic balance and transaction count maintained as part of the Ethereum state. 
They also have some (possibly empty) EVM Code and a (possibly empty) Storage State associated with them. 
Though homogenous, it makes sense to distinguish between two practical types of account: those with empty 
associated EVM Code (thus the account balance is controlled, if at all, by some external entity) and those with 
non-empty associated EVM Code (thus the account represents an Autonomous Object). Each Account has a 
single Address that identifies it. 

Transaction: A piece of data, signed by an External Actor. It represents either a Message or a new Autonomous 
Object. Transactions are recorded into each block of the blockchain. 

Autonomous Object: A notional object existent only within the hypothetical state of Ethereum. Has an intrinsic 
address and thus an associated account; the account will have non-empty associated EVM Code. Incorporated 
only as the Storage State of that account. 

Storage State: The information particular to a given Account that is maintained between the times that the 
Account’s associated EVM Code runs. 

Message: Data (as a set of bytes) and Value (specified as Ether) that is passed between two Accounts, either 
through the deterministic operation of an Autonomous Object or the cryptographically secure signature of the 
Transaction. 

Message Call: The act of passing a message from one Account to another. If the destination account is associated 
with non-empty EVM Code, then the VM will be started with the state of said Object and the Message acted 
upon. If the message sender is an Autonomous Object, then the Call passes any data returned from the VM 
operation. 

Gas: The fundamental network cost unit. Paid for exclusively by Ether (as of PoC-4), which is converted freely 
to and from Gas as required. Gas does not exist outside of the internal Ethereum computation engine; its price 
is set by the Transaction and miners are free to ignore Transactions whose Gas price is too low. 

Contract: Informal term used to mean both a piece of EVM Code that may be associated with an Account or an 
Autonomous Object. 

Object: Synonym for Autonomous Object. 

App: An end-user-visible application hosted in the Ethereum Browser. 

Ethereum Browser: (aka Ethereum Reference Client) A cross-platform GUI of an interface similar to a simplified 
browser (a la Chrome) that is able to host sandboxed applications whose backend is purely on the Ethereum 
protocol. 

Ethereum Virtual Machine: (aka EVM) The virtual machine that forms the key part of the execution model 
for an Account’s associated EVM Code. 

Ethereum Runtime Environment: (aka ERE) The environment which is provided to an Autonomous Object 
executing in the EVM. Includes the EVM but also the structure of the world state on which the EVM relies for 
certain I/O instructions including CALL & CREATE. 

EVM Code: The bytecode that the EVM can natively execute. Used to formally specify the meaning and rami- 
fications of a message to an Account. 

EVM Assembly: The human-readable form of EVM-code. 

LLL: The Lisp-like Low-level Language, a human-writable language used for authoring simple contracts and general 
low-level language toolkit for trans-compiling to. 


Appendix B. Recursive Length Prefix 


This is a serialisation method for encoding arbitrarily structured binary data (byte arrays). 
We define the set of possible structures T: 


(163) 

(164) 

(165) 


T = LUB 

L = {t : t = (t[0], t [1], ...) A V„<||t|| t[n] € T} 

B = {b : b = (b[0],b[l], ...) A V 7l< || b || b[n] G O} 


Where O is the set of bytes. Thus B is the set of all sequences of bytes (otherwise known as byte-arrays, and a leaf if 
imagined as a tree), L is the set of all tree-like (sub-)structures that are not a single leaf (a branch node if imagined as 
a tree) and T is the set of all byte-arrays and such structural sequences. 
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We define the RLP function as RLP through two sub-functions, the first handling the instance when the value is a 
byte array, the second when it is a sequence of further values: 


(166) 


RLP(x) 


:Rb(x) if x e B 
:Ri(x) otherwise 


If the value to be serialised is a byte-array, the RLP serialisation takes one of three forms: 


• If the byte-array contains solely a single byte and that single byte is less than 128, then the input is exactly 
equal to the output. 

• If the byte-array contains fewer than 56 bytes, then the output is equal to the input prefixed by the byte equal 
to the length of the byte array plus 128. 

• Otherwise, the output is equal to the input prefixed by the minimal-length byte-array which when interpreted 
as a big-endian integer is equal to the length of the input byte array, which is itself prefixed by the number of 
bytes required to faithfully encode this length value plus 183. 


Formally, we define Rb- 


(167) 

-Rb(x) 

(168) 

BE(x) 

(169) 

(a) ■ (fe, c) • (d, e) 


I" 

S (128 4- || x|| ) ■ x 
[ (183 + 1 1 BE( ||x|| ) 1 1 ) ■ BE( || x || ) ■ x 


if ||x|| = 1 A x[0] < 128 
else if ||x|| < 56 
otherwise 


n< || b || 

(bo, bi , ...) : io ^ 0 A i = b n ■ 256 lM ~ 1 ~ n 

71 = 0 


(a, b, c, d, e ) 


Thus BE is the function that expands a positive integer value to a big-endian byte array of minimal length and the 
dot operator performs sequence concatenation. 

If instead, the value to be serialised is a sequence of other items then the RLP serialisation takes one of two forms: 

• If the concatenated serialisations of each contained item is less than 56 bytes in length, then the output is equal 
to that concatenation prefixed by the byte equal to the length of this byte array plus 192. 

• Otherwise, the output is equal to the concatenated serialisations prefixed by the minimal-length byte-array 
which when interpreted as a big-endian integer is equal to the length of the concatenated serialisations byte 
array, which is itself prefixed by the number of bytes required to faithfully encode this length value plus 247. 

Thus we finish by formally defining R\: 

( (192 + ||s(x)||) • s(x) if || s(x) || < 56 

| (247 + ||BE(|| s(x) II) II) • BE(||s(x)||) ■ s(x) otherwise 

= RLP(xo) ■ RLP(xi)... 

If RLP is used to encode a scalar, defined only as a positive integer (P or any x for P^), it must be specified as the 
shortest byte array such that the big-endian interpretation of it is equal. Thus the RLP of some positive integer i is 
defined as: 


(170) 

-Ri(x) 

(171) 

s(x) 


(172) RLP(i : i G P) = RLP(BE(i)) 

When interpreting RLP data, if an expected fragment is decoded as a scalar and leading zeroes are found in the byte 
sequence, clients are required to consider it non-canonical and treat it in the same manner as otherwise invalid RLP 
data, dismissing it completely. 

There is no specific canonical encoding format for signed or floating-point values. 


Appendix C. Hex-Prefix Encoding 

Hex-prefix encoding is an efficient method of encoding an arbitrary number of nibbles as a byte array. It is able to 
store an additional flag which, when used in the context of the trie (the only context in which it is used), disambiguates 
between node types. 

It is defined as the function HP which maps from a sequence of nibbles (represented by the set ¥) together with a 
boolean value to a sequence of bytes (represented by the set B): 


(173) HP(x, () : x £ ¥ 

(174) f(t) 


(16/(f), 16x[0] + x[l], 16x[2] + x[3], ...) if ||x|| is even 

(16 (/(f) + 1) + x[0], 16x[l] + x[2], 16x[3] -I- x[4], ...) otherwise 

2 if t^O 
0 otherwise 


Thus the high nibble of the first byte contains two flags; the lowest bit encoding the oddness of the length and the 
second-lowest encoding the flag t. The low nibble of the first byte is zero in the case of an even number of nibbles and the 
first nibble in the case of an odd number. All remaining nibbles (now an even number) fit properly into the remaining 
bytes. 
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Appendix D. Modified Merkle Patricia Tree 


The modified Merkle Patricia tree (trie) provides a persistent data structure to map between arbitrary-length binary 
data (byte arrays). It is defined in terms of a mutable data structure to map between 256-bit binary fragments and 
arbitrary-length binary data, typically implemented as a database. The core of the trie, and its sole requirement in terms 
of the protocol specification is to provide a single value that identifies a given set of key-value pairs, which may be either 
a 32 byte sequence or the empty byte sequence. It is left as an implementation consideration to store and maintain the 
structure of the trie in a manner that allows effective and efficient realisation of the protocol. 

Formally, we assume the input value 3, a set containing pairs of byte sequences: 

(175) 3 = {(fc 0 £ B, vo £ B), ki £ B, v\ £ B), ...} 

When considering such a sequence, we use the common numeric subscript notation to refer to a tuple’s key or value, 
thus: 


(176) V I& I = {I 0 ,h) 

Any series of bytes may also trivially be viewed as a series of nibbles, given an endian-specific notation; here we 
assume big-endian. Thus: 


(177) 

(178) 


i/P) 


V„ V. 


i:i<2|| k n 


k' 


{Pq £ ¥, vo £ B), (Aq £ Y, Di £B),...} 
J [ k n [i -A 2] A 16J if i is even 
1 fc n [L* A 2J] mod 16 otherwise 


We define the function TRIE, which evaluates to the root of the trie that represents this set when encoded in this 
structure: 


(179) 


trie(3) = kec(c(3,0)) 


We also assume a function n, the trie’s node cap function. When composing a node, we use RLP to encode the 
structure. As a means of reducing storage complexity, for nodes whose composed RLP is fewer than 32 bytes, we store 
the RLP directly; for those larger we assert prescience of the byte array whose Keccak hash evaluates to our reference. 
Thus we define in terms of c, the node composition function: 

f 0 if 3 = 0 

(180) n(3,i)= <c{3,i) if ||c(3, i)|| < 32 

[KEC^cp, i)) otherwise 


In a manner similar to a radix tree, when the trie is traversed from root to leaf, one may build a single key-value 
pair. The key is accumulated through the traversal, acquiring a single nibble from each branch node (just as with a 
radix tree). Unlike a radix tree, in the case of multiple keys sharing the same prefix or in the case of a single key having 
a unique suffix, two optimising nodes are provided. Thus while traversing, one may potentially acquire multiple nibbles 
from each of the other two node types, extension and leaf. There are three kinds of nodes in the trie: 

Leaf: A two-item structure whose first item corresponds to the nibbles in the key not already accounted for by the 
accumulation of keys and branches traversed from the root. The hex-prefix encoding method is used and the 
second parameter to the function is required to be true. 

Extension: A two-item structure whose first item corresponds to a series of nibbles of size greater than one that 
are shared by at least two distinct keys past the accumulation of the keys of nibbles and the keys of branches 
as traversed from the root. The hex-prefix encoding method is used and the second parameter to the function 
is required to be false. 

Branch: A 17-item structure whose first sixteen items correspond to each of the sixteen possible nibble values for 
the keys at this point in their traversal. The 17th item is used in the case of this being a terminator node and 
thus a key being ended at this point in its traversal. 

A branch is then only used when necessary; no branch nodes may exist that contain only a single non-zero entry. We 
may formally define this structure with the structural composition function c: 

if ||3|| = 1 where 31 : I £ 3 

if i ^ j where j = argmaxj : 31 : |jl|| = x : V/ e a : Jo[0..(a: — 1)] = 1 

otherwise where u (j) = n({/ : I £ 3 A 7o[i] = j},i + 1) 

f/i if 3/ : I £ 3 A ll-Toll =* 

v = < 

I () otherwise 


c(3, i) = < 


RLpf(HP(/ 0 [j..(||/o|| - l)],fr«e),/i)) 
RLPnHP(/ 0 [i..(j - 1)], false), n(3,j))'j 
rlpmm(O), u(l), ..., w(15), v)) 


D.l. Trie Database. Thus no explicit assumptions are made concerning what data is stored and what is not, since 
that is an implementation-specific consideration; we simply define the identity function mapping the key-value set 3 
to a 32-byte hash and assert that only a single such hash exists for any 3, which though not strictly true is accurate 
within acceptable precision given the Keccak hash’s collision resistance. In reality, a sensible implementation will not 
fully recompute the trie root hash for each set. 
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A reasonable implementation will maintain a database of nodes determined from the computation of various tries 
or, more formally, it will memoise the function c. This strategy uses the nature of the trie to both easily recall the 
contents of any previous key-value set and to store multiple such sets in a very efficient manner. Due to the dependency 
relationship, Merkle-proofs may be constructed with an O(logA) space requirement that can demonstrate a particular 
leaf must exist within a trie of a given root hash. 


Appendix E. Precompiled Contracts 

For each precompiled contract, we make use of a template function, E P r E , which implements the out-of-gas checking. 


(182) 


- „ n _ [(0,°,A o ,O) 

—pre(ct , g, 1 ) = 


If g < gr 
(a, g — g r , A 0 , o) otherwise 


The precompiled contracts each use these definitions and provide specifications for the o (the output data) and g r , 
the gas requirements. 

For the elliptic curve DSA recover VM execution function, we also define d to be the input data, well-defined for an 
infinite length by appending zeroes as required. Importantly in the case of an invalid signature (ECDSARECOVER(h, v, r, s) = 
0), then we have no output. 


(183) 

“ECREC 

= iH^pre where: 


(184) 

gr 

= 3000 


(185) 

l°l 

Jo if ECDSAREC0VER(/i, v, r, s) = 0 

1 32 otherwise 

(186) 

if j o | = 32 : 



(187) 

o[0..11] 

= 0 


(188) 

o[12. .31] 

= KEC(ECDSAREC0VER(/i, v, r, s )) [12. .31] 

where: 

(189) 

d[0..(|/d| - 1)] 

= Id 


(190) 

d[j/d|..] 

= (0,0,...) 


(191) 

h 

= d[0..31] 


(192) 

V 

= d[32..63] 


(193) 

r 

= d[64..95] 


(194) 

s 

= d[96..127] 


The two hash functions, RIPEMD-160 and SHA2-256 are more trivially defined as 

an almost pass-through operation. 

Their gas usage is dependent on the input data size, a factor rounded up to the nearest number of words. 

(195) 


Hsha256 = Hpre where: 


(196) 


gr = 60+12[!§Ij 


(197) 


o[0..31] = SHA256(/ d ) 


(198) 


^ripi6o = ^pre where: 


(199) 


g, = 600+120[!Mj 


(200) 


o[0..11] = 0 


(201) 


o[12..31] = RIPEMD160(/ d ) 


(202) 




For the purposes here, 

we assume we have well-defined standard cryptographic functions for RIPEMD-160 and SHA2- 

256 of the form: 




(203) 


SHA256(i £l) = o € B 32 


(204) 


RIPEMD160(i £l) = oel 20 


Finally, the fourth contract, the identity function Hid simply defines the output as 

the input: 

(205) 


i^id = ^pre where: 


(206) 




(207) 


0 = i d 
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Appendix F. Signing Transactions 


The method of signing transactions is similar to the ‘Electrum style signatures’ as defined by Arnaud et ah: [2017], 
heading ’’Managing styles with Radium” in the bullet point list. This method utilises the SECP-256kl curve as described 
by sec and Wuille [2014], and is implemented similarly to as described by Gura et al. [2004] on p. 9 of 15, para. 3. 

It is assumed that the sender has a valid private key p r , which is a randomly selected positive integer (represented as 
a byte array of length 32 in big-endian form) in the range [1, secp256kln — 1]. 

We assert the functions ECDSAPUBKEY, ECDSARECOVER and ECDSASIGN. These are formally defined in the literature, e.g. 
by Johnson et al. [2001] and ECD; A 

(208) ECDSAPUBKEY (p r £ B 32 ) = p u £ B 64 

(209) ECDSASIGN(e £ B 32 ,p r £ B 32 ) = (v £ Bi, r £ B 32 , s £ B 32 ) 

(210) ECDSAREC0VER(e £ B 32 ,u £ Bi,r £ B 32 ,s £ B 32 ) = p u £ B 64 

Where p u is the public key, assumed to be a byte array of size 64 (formed from the concatenation of two positive 

integers each < 2 256 ) and p r is the private key, a byte array of size 32 (or a single positive integer in the aforementioned 
range). It is assumed that v is the ‘recovery id’, a 1 byte value specifying the sign and finiteness of the curve point; this 
value is in the range of [27, 30], however we declare the upper two possibilities, representing infinite values, invalid. 

We declare that a signature is invalid unless all the following conditions are true: 

(211) 0 < r < secp256kln 

(212) 0 < s < secp256kln £2+1 

(213) [v'.€ {27, 28} 
where: 

(214) sec P 256kln = 115792089237316195423570985008687907852837564279074904382605163141518161494337 


For a given private key, p r , the Ethereum address A(p r ) (a 160-bit value) to which it corresponds is defined as the 
right most 160-bits of the Keccak hash of the corresponding ECDSA public key: 


(215) A(p r ) = S 9 6..255(KEC(ECDSAPUBKEY(p r ))) 

The message hash, h(T), to be signed is the Keccak hash of the transaction without the latter three signature 
components, formally described as T r , T s and:T w ; 


(216) 

(217) 


Ls{T) 

h(T) 


f ( I ii , T p , T g , Tt - 7 v j Ti ) 
( ( 1 n , T p , T g , Tt , T v j Td ) 
KEC(L S (T)) 


The signed transaction G(T,p r ) is defined as: 


if T t = 0 
otherwise 


(218) G(T,p r ) = T except: 

(219) (T w ;T r ,T s ) = ECDSASIGN(/i(T),p r ) 

We may then define the sender function S of the transaction as: 

(220) S(T) = S 96..255 (KEC(ECDSAREC0VER(h(T); T w ; T r , T s ))) 

The assertion that the sender of a signed transaction equals the address of the signer should be self-evident: 


(221) 


VT : Mp r : S{G{T,p r )) = A(p r ) 


^For the former citation, refer to section 6.2 for ECDSAPUBKEY, and section 7 for ECDSASIGN and ECDSARECOVER. 
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Appendix G. Fee Schedule 


The fee schedule G is a tuple of 31 scalar values corresponding to the relative costs, in gas, of a number of abstract 
operations that a transaction may effect. 


Name 


Value Description* 


G zero 

Gbase 

Gverylow 

Glow 

Grnid 

Ghigh 

G extcode 
Gbal ance 
G sload 
G jumpdest 
G sset 
G sreset 

Rsclear 

Rs el f de struct 
G s el f destruct 
G create 
G codedeposit 

G call 
Gcallvalue 
G call stipend 


0 Nothing is paid for operations of the set W zero - 

2 This is the amount of gas to pay for operations of the set Wt, ase . 

3 This is the amount of gas to pay for operations of the set W very i 0 w 

5 This is the amount of gas to pay for operations of the set Wi ow . 

8 This is the amount of gas to pay for operations of the set W m id- 

10 This is the amount of gas to pay for operations of the set Whi g h- 

700 This is the amount of gas to pay for operations of the set W extcode ■ 

400 This is the amount of gas to pay for a BALANCE operation. 

200 This is paid for an SLOAD operation. 

1 This is paid for a JUMPDEST operation. 

20000 This is paid for an SSTORE operation when the storage value is set to non-zero from zero. 
5000 This is the amount for an SSTORE operation when the storage value’s zeroness remains 
unchanged or is set to zero. 

15000 This is the refund given (added into the refund counter) when the storage value is set to zero 
from non-zero. 

24000 This is the refund given (added into the refund counter) for self-destructing an account. 

5000 This is the amount of gas to pay for a jSELFDESTRUCT operation. 

32000 This is paid for a CREATE operation. 

200 This is paid per byte for a CREATE operation to succeed in placing code into the state. 

700 This is paid for a CALL operation. 

9000 This is paid for a non-zero value transfer as part of the CALL operation. 

2300 This is a stipend for the called contract subtracted from Gcallvalue for a non-zero value 
transfer. 


G new account 

25000 

This is 

G exp 

10 

This is 

G expbyte 

50 

This is 

Gmemory 

3 

This is 

Gtxcreate 

32000 

This is 

G txdatazero 

4 

This is 

Gtxdat anon zero 

68 

This is 

G transaction 

21000 

This is 

Glog 

375 

This is 

Glogdata 

8 

This is 

Glogtopic 

375 

This is 

G sha3 

30 

This is 

G sha3word 

6 

This is 

G copy 

3 

This is 


Gu oc 


20 


artial payment for *COPY operations, multiplied by the number of words copied, 
rounded up. 

This is a payment for a BLOCKHASH operation. 


Appendix H. Virtual Machine Specification 
When interpreting 256-bit binary values as integers, the representation is big-endian. 

When a 256-bit machine datum is converted to and from a 160-bit address or hash, the rightwards (low-order for BE) 
20 bytes are used and the left most 12 are discarded or filled with zeroes, thus the integer values (when the bytes are 
interpreted as big-endian) are equivalent. 


H.l. Gas Cost. The general gas cost function, C, is defined as: 
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( 222 ) 


/) — Cmem^/J'i) Cmem (Mi)+S 


Csstore(o" , /r) 

if 

W 

= SSTORE 

G exp 

if 

w 

= EXPA/x s [l] =0 

Gexp + Gexpbyte X (1 + Ll°§256 (/^s W )J ) ^ 

w 

= EXP A/x s [l] > 0 

Gverylow "b G copy X [2] — 32”| 

if 

w 

= CALLDATACOPY V CODECOPY 

G ext code "b G copy X |"/i- s [3] ~ 32"| 

if 

w 

= EXTCODECOPY 

Glog + Glogdata X /i- s [l] 

if 

w 

= LOGO 

Glog + Glogdata X /X g [l] Glogtopic 

if 

w 

= LOG1 

Glog + Glogdata X /X g [1] 2 Glogtopic 

if 

w 

= LOG2 

Glog + Glogdata X /X g [1] SGlogtopic 

if 

w 

= LOG3 

Glog + Glogdata X /i- g [1] ~b 4 Glogtopic 

if 

w 

= LOG4 

C , call(ct 5 m) 

if 

w 

= CALL V CALLCODE V DELEGATECALL 

I^SELFDESTRUCT (^" * f-l) 

if 

w 

= 'Selfdestruct: 

G create 

if 

w 

= CREATE 

G sha3 + G sha3word |" S [l] ~r 32”| 

if 

w 

= SHA3 

G jumpdest 

if 

w 

= JUMPDEST 

G sload 

if 

w 

= SLOAD 

G zero 

if 

w 

G Wzero 

Gbase 

if 

w 

£ Wbase 

Gverylow 

if 

w 

£ ^Vverylow 

Glow 

if 

w 

£ Wlow 

Grnid 

if 

w 

£ W^mid 

Ghigh 

if 

w 

£ Whigh 

Gexteode 

if 

w 

£ bV extcode 

Gbalance 

if 

w 

= BALANCE 

Gblockhash 

if 

w 

= BLOCKHASH 


(223) 


w — 


lb 

STOP 


if l l pc. ^ II -fb 

otherwise 


where: 

(224) 


C' m,em (u) — G rt 


■ a + 



with;CcALL;;CsELFDESTaucT:and:C'ssTORE;as specified in the appropriate section below. We define the following subsets 
of instructions: 


W zero = (STOP, RETURN} 

Wbase = {ADDRESS, ORIGIN, CALLER, CALLVALUE, CALLDATASIZE, CODESIZE, GASPRICE, COINBASE, 
TIMESTAMP, NUMBER, DIFFICULTY, GASLIMIT, POP, PC, MSIZE, GAS} 

Wyerylow = {ADD, SUB, NOT, LT, GT, SLT, SGT, EQ, ISZERO, AND, OR, XOR, BYTE, CALLDATALOAD, 

MLOAD, MSTORE, MSTORE8, PUSH*, DUP*, SWAP*} 

Wiow = {MUL, DIV, SDIV, MOD, SMOD, SIGNEXTEND} 

Wmid = {ADDMOD, MULMOD, JUMP} 

W high = {JUMPI} 

Wextcode = {EXTCODESIZE} 

Note the memory cost component, given as the product of Gmemory and the maximum of 0 & the ceiling of the number 
of words in size that the memory must be over the current number of words, fi i in order that all accesses reference valid 
memory whether for read or write. Such accesses must be for non-zero number of bytes. 

Referencing a zero length range (e.g. by attempting to pass it as the input range to a CALL) does not require memory 
to be extended to the beginning of the range. is defined as this new maximum number of words of active memory; 
special-cases are given where these two are not equal. 

Note also that C me m is the memory cost function (the expansion function being the difference between the cost before 
and after). It is a polynomial, with the higher-order coefficient divided and floored, and thus linear up to 724B of memory 
used, after which it costs substantially more. 

While defining the instruction set, we defined the memory-expansion for range function, M , thus: 


(225) 




s if l = 0 

max(s, ["(/ + l) -r 32] ) otherwise 
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Another useful function is “all but one 64th” function L defined as: 


(226) 


L(n) = n — |_n/64j 


H.2. Instruction Set. As previously specified in section ■§; these definitions take place in the final context there. In 
particular we assume O is the EVM state-progression function and define the terms pertaining to the next cycle’s state 
(a * 1 , m ') such that: 


(227) 


0(cr, /x, A, I) = (er 1 , fi' , A ' , I) with exceptions, as noted 


Here given are the various exceptions to the state transition rules given in section '9; specified for each instruction, 
together with the additional instruction-specific definitions of J and C. For each instruction, also specified is a, the 
additional items placed on the stack and (5, the items removed from the stack, as defined in section '9: 

Os: Stop and Arithmetic Operations 

All arithmetic is modulo 2 256 unless otherwise noted. The zero-th power of zero 0° is defined to be one. 


Value 

Mnemonic 

5 

a 

Description 

0x00 

STOP 

0 

0 

This operation halts execution, outputting the empty sequence as per equation :132; 

0x01 

ADD 

2 

1 

This is the addition operation. 

Ms[0] = Ms[0] + M s [l] 

0x02 

MUL 

2 

1 

This is the multiplication operation. 

Ms[0] = M s [0] x M s [l] 

0x03 

SUB 

2 

1 

This is the subtraction operation. 

Ms[0] s m s [ 0] -MsW 

0x04 

DIV 

2 

1 

This is the integer division operation. 

u! [01 = /° if ^W=° 

\Lm s [0] Ms[!]J otherwise 


0x05 SDIV 


0x06 MOD 


1 This is the signed integer division operation (truncated). 

fO if M S [1]=0 

Ms[0] — \ — 2 255 if Ma [0] = — 2 255 A /*-[!] = -1 

UgnC/hJO] 4 -/x s [1])UHs[0] T/x s [1]|J otherwise 
Where all values are treated as two’s complement signed 256-bit integers. 

Note the overflow semantic when — 2 255 is negated. 


This is the modulo remainder operation. 

'0 if /x „[1]=0 

M s [0] mod /x s [l] otherwise 


H'[0] = 


0x07 SMOD 


1 This is the signed modulo remainder operation. 

■ 0 if m.[ 1] = 0 


^ 1 sgn (n s [0] ) ( | m s [0] | mod | m s [1] | ) otherwise 

Where all values are treated as two’s complement signed 256-bit integers. 


0x08 ADDMOD 


0x09 MULMOD 


0x0a EXP 


1 This is the modulo addition operation. 

■ 0 if /x s [2]=0 


^ [ (/x. s [0] + /x s [l]) mod p, s [2] otherwise 

All intermediate calculations of this operation are not subject to the 2 256 modulo. 


1 This is the modulo multiplication operation. 

■ 0 if Ms [2] 1 0 


^ l_(/- t s [0] x Hstl]) mo< i Ms [2] otherwise 

All intermediate calculations of this operation are not subject to the 2 256 modulo. 


1 This is the exponential operation. 

M ' [ 0 ] = Ms [ 0]'*- 1 11 


0x0b SIGNEXTEND 2 1 Extend the length of a two’s complement signed integer. 


u,. c In , W 1 ,,'rm - jMsMt if where t = 256 - 8(m s [0] + 1) 

Vi G [0..255J : M s [0Ji = < 

I M s [lji otherwise 


M s [*]i gives the ith bit (counting from zero) of MsM 
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Value 

Mnemonic 

5 

a 

10s: Comparison & Bitwise Logic Operations 

Description 

0x10 

LT 

2 

i 

This is the less-than comparison. 

if 

|0 otherwise 

Oxll 

GT 

2 

i 

This is the greater-than comparison. 

fi if M.[0 ]>m.[1] 

|0 otherwise 

0x12 

SLT 

2 

i 

This is the signed less-than comparison. 

if 

1^0 otherwise 

Where all values are treated as two’s complement signed 256-bit integers. 

0x13 

SGT 

2 

i 

This is the signed greater-than comparison. 

/ r 0 l = / 1 if Ms[0]>MsM 

1 0 otherwise 

Where all values are treated as two’s complement signed 256-bit integers. 

0x14 

EQ 

2 

i 

This is the equality comparison. 

ft if Ms[o| = M s [l] 
otherwise 

0x15 

ISZERO 

1 

i 

This is the logical negation operation, also called the logical complement or the NOT 
operation. 

fj,' s [o] = / 1 lf Ms ^ =0 
otherwise 

0x16 

AND 

2 

i 

This is the bitwise AND operation. 

Vi€ [0..255] : M :[0]i = /*.[0],A/*.[l] i 

0x17 

OR 

2 

i 

This is the bitwise OR operation. 

Vt6 [0..255] : M :[0] i = /*.[0]iVM.[l] i 

0x18 

XOR 

2 

i 

This is the bitwise XOR operation. 

Vi€ [0..255] :/*:[0]i = /*.[0]i®/*.[l]i 

0x19 

NOT 

1 

i 

This is the bitwise NOT operation. 

Vi6[0..255]: M '[0] iS {j lf ^ [0]i = ° 

Oxla 

BYTE 

2 

i 

Retrieve a single byte from a word. 

Vi € [0..255] : /4[0]i = Wc<+a#--[o]) if * < 8 A M.[0] < 32 

1 J sL J [0 otherwise 

For the Nth byte, we count from the left (i.e. N = 0 would be the most significant in 
big endian). 





20s: SHA3 

Value 

Mnemonic 


a 

Description 

0x20 

SHA3 

2 

i 

Compute a Keccak-256 hash. 

Ms [0] = Keccak(/j, m [/j. s [0] . . . (p s [0] + /i s [l] - 1)]) 

Mi = ^(Mi.MstOl.MsW) 
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30s: Environmental Information 

Value 

Mnemonic 

S 

a 

Description 

0x30 

ADDRESS 

0 

i 

Get the address of the currently executing account. 

Ms [0] — /a 

0x31 

BALANCE 

1 

i 

Get the balance of the given account. 

/ r Q i = f ^[M s [0]]b if <x[m s [ 0] mod 2 160 ] ^ 0 
f 0 otherwise 

0x32 

ORIGIN 

0 

i 

Get the address that the execution originated from. 

M s [0] = /o 

This is the sender of the original transaction; it is never an account with non- 
empty associated code. 

0x33 

CALLER 

0 

i 

Get the caller address. 

Ms[0 '] = /. 

This is the address of the account that is directly responsible for this execution. 

0x34 

CALLVALUE 

0 

i 

Get the deposited value by the instruction/transaction responsible for this exe- 
cution. 

/C" ; /v 

0x35 

CALLDATALOAD 

1 

i 

Get the input data of the current environment. 


/x' [0] = / d [M s [0] • • • (m s [ 0] + 31)] with J d[x] = 0 if x^\\I d \\ 

This pertains to the input data passed with the message call instruction or trans- 
action. 


0x36 CALLDATASIZE 0 1 Get the size of the input data in the current environment. 

Ms[0] = II /d|| 

This pertains to the input data passed with the message call instruction or trans- 
action. 


0x37 CALLDATACOPY 3 


0 


Copy the input data in the current environment to memory. 

v r// rm -t-7-i — + *] if MsW + * < l|/ d || 

V ie{ o...^ [a] — i } /x m [M.[0]+*] = | 0 otherwise 

The additions in p s [1] + i are not subject to the 2 256 modulo. 

fj,[ = M(fi i: Ms[0],Ms[ 2 ]) 


This pertains to the input data passed with the message call instruction or trans- 
action. 


0x38 

CODESIZE 

0 

1 

Get the size of code running in the current environment. 

m1[o] = l|/b|| 

0x39 

CODECOPY 

3 

0 

Copy code running in the current environment to memory. 

y „/ r, r nl , U _ /fblMsfl] +*] if Ms! 1 ] + * < \\Ih || 

V.S10 „.P]-.)(* m K|0] +.] , | sxop otherwiB(! 

i,/x s [0],M s [2]) 

The additions in p s [1] + i are not subject to the 2 256 modulo. 

0x3a 

GASPRICE 

0 

1 

Get the price of gas in the current environment. 

M s [0] = /p 

This is the gas price specified by the originating transaction. 

0x3b 

EXTCODESIZE 

1 

1 

Get the size of an account’s code. 

Ms[0] = I!°'[Ms[ 0] mod 2 160 ] c || 

0x3c 

EXTCODECOPY 

4 

0 

Copy an account’s code to memory. 

V ,/ r, m , _ J c lMs[2] + *] if Ms [2] +* < l|c|| 

v, 6( o. +.] , j STOp otherwiBe 

where c = cr [/x s [0] mod 2 160 ] c 

Mi — ff/(Mii Ms! 1 ]: M s [3]) 

The additions in /x s [2] + i are not subject to the 2 256 modulo. 
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40s: Block Information 

Value 

Mnemonic 

8 

a 

Description 

0x40 

BLOCKHASH 

1 

1 

Get the hash of one of the 256 most recent complete blocks. 

M '[0] = P(fe p , Ms [0],0) 

where P is the hash of a block of a particular number, up to a maximum age. 0 is left 
on the stack if the looked for block number is greater than the current block number 
or more than 256 blocks behind the current block. 

( 0 if ti>Jli:Va = 256 V h = 0 

P(h, n, a) = < h if n = Hi'. 

P{H p ; n, a + 1) otherwise 

and we assert that the reason why the header H can be determined is because its 
hash is the parent hash in the block following it. 

0x41 

COINBASE 

0 

1 

Get the block’s beneficiary address. 

Ms[°] = !h c 

0x42 

TIMESTAMP 

0 

1 

Get the block’s timestamp. 

K[0] = Ih b 

0x43 

NUMBER 

0 

1 

Get the block’s number. 

Ms[ 0] = J Hi 

0x44 

DIFFICULTY 

0 

1 

Get the block’s difficulty. 
a4[°] = hid 

0x45 

GASLIMIT 

0 

1 

Get the block’s gas limit. 

Ms[ 0] = Ihi 
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50s: Stack, Memory, Storage and Flow Operations 

Value 

Mnemonic 

5 

a 

Description 

0x50 

POP 

1 

0 

Remove the top, first item from the stack. 

0x51 

MLOAD 

1 

1 

Load the first word from memory. 

Ms[0] = M m [M s [0] • • ■ (m s [0] + 31)] 
m! = max(/x; , f(/x s [0] + 32) -P 32] ) 

The addition in the calculation of p- is not subject to the 2 256 modulo. 

0x52 

MSTORE 

2 

0 

Save the first word (which is the first item in the stack) to memory. 

• ■ • (M.[0] + 31)] s pjl] 
m! = max(pi, r (/x s [0] + 32) -P 32]) 

The addition in the calculation of Mi is not subject to the 2 256 modulo. 

0x53 

MSTORE8 

2 

0 

Save the first byte to memory. 

Mmkp]] = (Ms[!] mod 256) 

MS = max(/Xi, r(M s [0] + 1) A 32]) 

The addition in the calculation of Mi is not subject to the 2 256 modulo. 

0x54 

SLOAD 

1 

1 

Load the first word from storage. 

Ms[°] = o-[4l] s [m s [°D 

0x55 

SSTORE 

2 

0 

Save the first word to storage. 

O-'[/a] S [M s [0]] = M s [l] 

r (rr id — j ^ sset d Ms [!] 7 ^ 0 A <r[/ a ]s[/x s [0]] = 0 

UsSTOREl^, M; = \ ^ ,, 

^ sreset otherwise 





^ f Rsclear if Ms [1] = 0 A CT [/ a ] s [fl s [0]] ^ 0 

1 0 otherwise 

0x56 

JUMP 

1 

0 

Alter the program counter. 

Jjump(m) = Ms[0] 

This has the effect of writing said value to M pc - See equation :138; 

0x57 

JUMPI 

2 

0 

Conditionally alter the program counter. 

7 / \ _ J M s [0] if MsWt^O 

[ n pc + 1 otherwise 

This has the effect of writing said value to fi pc . See section! 138: 

0x58 

PC 

0 

1 

Get the value of the program counter prior to the increment corresponding to this 
instruction. 

Ms[ °] s M pc 

0x59 

MSIZE 

0 

1 

Get the size of active memory in bytes. 

Ms[0] = 32p, 

0x5a 

GAS 

0 

1 

Get the amount of available gas, including the corresponding reduction for the cost of 
this instruction. 

M«[0] = Mg 

0x5b 

JUMPDEST 

0 

0 

Mark a valid destination for jumps. 

This operation has no effect on the machine state during execution. 
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60s &; 70s: Push Operations 


Value Mnemonic 

0x60 PUSH1 


5 a Description 

0 1 Place a 1 byte item on the stack. 

/4[0] =c{n pc + 1) 

if x < || Jb|| 

0 otherwise 

The bytes are read in line from the program code’s bytes array. 

The function c ensures the bytes default to zero if they extend past the limits. 

The byte is right-aligned (i.e. it takes the lowest significant place and the last, highest 
address, which is the big-endian interpretation). 


where 


c(x)=> Ib[x] 


0x61 PUSH2 0 1 Place a 2-byte item on the stack. 

Ms[0] = c(O pc + !)■■■ (/V + 2)) 

with c(x) = (c(x o), ..., c(a!|| J .||_i)) with c as defined as above. 

Similarly, the bytes are right-aligned (i.e. they take the lowest significant place and the 
last, highest address, which is the big-endian interpretation). 


0x7f PUSH32 0 1 Place 32-byte (full word) item on stack. 

Ms [0] = c ((M pc + !)■■■ (M pc + 32)) 
where c is defined as above. 

Similarly, the bytes are right-aligned (i.e. they take the lowest significant place and the 
last, highest address, which is the big-endian interpretation). 

80s: Duplication Operations 

Value Mnemonic <5 a Description 

0x80 DUP1 1 2 Duplicate the 1st stack item. 

Ms [0] = Ms [0] 

0x81 DUP2 2 3 Duplicate the 2nd stack item. 

Ms [0] = M s [l] 


0x8f 

DUP16 

16 

17 

Duplicate the 16th stack item. 

Ms[0] = Ms [15] 





90s: Exchange Operations 

Value 

Mnemonic 

S 

a 

Description 

0x90 

SWAP1 

2 

2 

Exchange the 1st and the 2nd stack items. 

Ms [0] = M s [l] 

Msflj = Ms [0] 

0x91 

SWAP2 

3 

3 

Exchange the 1st and the 3rd stack items. 

Ms[0] = Ms [2] 

Ms [2J E Ms [0] 






0x9f 

SWAP16 

17 

17 

Exchange the 1st and the 17th stack items. 

Ms[0] = Ms! 16 ! 

Ms [16] = Ms [0] 
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aOs: Logging Operations 

For all logging operations, the state change is to append an additional log entry on to the substate’s log series: 

A[ = Ai- (/ a , t, /x m [/x s [0] . . . (a* s [0] + /x s [l] - !)]) 

and to update the memory consumption counter: 

= M(n i,M s [0],M s [l]) 

The entry’s topic series, t, differs accordingly: 


Value 

Mnemonic 

<5 

a 

Description 

OxaO 

LOGO 

2 

0 

Append the log record with no topics. 

t = () 

Oxal 

LOG1 

3 

0 

Append the log record with one topic. 

t = (/*.[2]) 


0xa4 

LOG4 

6 

0 

Append the log record with four topics. 

* = (Ms P] i Ms [3] , Ms [4] , Ms [ 5 D 
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fiOs: System operations 


Value Mnemonic 5 a Description 

OxfO CREATE 3 


Oxfl CALL 


1 Create a new account with associated code, 

i = MmkW ■ • • (M.[l] + Ms [2] - 1)1 

, , , , + , = fA(<T*,/ a ,/ 0 ,L(/r g ), Jp,/r s [0],i,/e + 1) if p s [0] s; cr[/ a ]b A /e < 1024 

((<r,/x g ,0) otherwise 

cr* = cr except cr* [/ a ] n = cr[7 a ] n + 1 

A' = A lyj A + which implies: A' = A s U A+ A A[= A\ ■ Af A A' T = A r + A+ 

Ms [0] = * 

where x = 0 if the code execution for this operation failed due to an exceptional halting: 
Z(cr* , fi, I) = T or I e = 1024 

(the maximum call depth limit is reached) or /x s [0] > cr[7 a ] b (the balance of the caller 
is too low to fulfil the value transfer); and otherwise x = A(7 a , cr[7 a ] n ), the address of 
the 

newly created account. 

Mi = M(Mi,M.[l],M.[2]) 

Thus the operand order is: value, input offset, input size. 


if p s [2] ^ cr[7 a ] b A 
/e < 1024 
otherwise 


7 1 Message-call into an account. 

i = Mm [Ms [3] • • • (Ms [3] + Ms [4] ~ 1)1 

{ 0 (cr , 7 a , I Q , t , t , 

CcALLGAS (m) i -fp i Ms [2] , Ms [2] , i, Ie + 1) 

(o-, 5,0,0) 

n = min({/r s [6], jo|}) 

Mm[Ms[5] • • • (Ms [5] + n - 1)] = o[0. . . (n - 1)] 

Mg = Mg + s' 

Ms [0] s x 

A' = Am A + 

t = /x s [l] mod 2 160 

where x = 0 if the code execution for this operation failed due to an exceptional halting 
Z((T, n, 7) = T or if 

Ms [2] > c[/ a ] b (not enough funds) or I e = 1024 (call depth limit reached); x = 1 
otherwise. 

M[ = M(M(m i , Ms [3] , Ms I 4 ! ) , Ms [ 5 ] , M s [ 6 1 ) 

Thus the operand order is: gas, to, value, in offset, in size, out offset, out size. 

CcAL,L.{o ‘ , A 4 ) = Cg ASCAp(c r , A 4 ) + Cextra(<T , /x) 

n , v J ^GASCApfo - , \1) + G call stipend if Ms [2] 7 ^ 0 

OcALLGASfO', MJ = \ x ,, 

[Cgascap(ct, m) otherwise 

n ( X _ f min{L(/x g - Cextra(ct, m)) ? M s [°]} if /x g > CextraO, //) 

OgascapI cr,n) = < 

I /x s [0] otherwise 


Cn 


(cr, /x) = G call + CxFEr(m) “I - f^NEw (^ 5 AO 

G new account if cr[/x s [l] mod 2 160 ] = 0 


Ce3 

^ / \ J G callvalue if Ms [2] 7 ^ 0 

Oxfer(P) = < . . 

0 otherwise 


,(o-,m) s 


0 


otherwise 


0xf2 CALLCODE 7 1 Message-call into this account with an alternative account’s code. 

Exactly equivalent to CALL except: 

( 0(cr*,7 a ,7 o ,7 a ,i, if p s [2] ^ er[7 a ] b A 

(°" , 5 , A + , o) = < Ccat.i.gas (m), !p, Ms [2] , Ms [2] , i, 7 e + 1 ) 7 e < 1024 

I (cr, <?, 0 , ()) otherwise 

Note the change in the fourth parameter to the call 0 from the 2nd stack value p s [l] 
(as in CALL) to the present address 7 a . This means that the recipient is in fact the 
same account as at present, simply that the code is overwritten. 

0xf3 RETURN 2 0 Halt execution returning output data. 

77retURn(p) = Mm[Ms[0] • • • (M.[0] + Ms! 1 ] — 1)1 

This has the effect of halting the execution at this point with output defined. 

See equation: 132; 

Mi = -^f(Mi, MsPl, Mst 1 ]) 
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0xf4 DELEGATECALL 6 


1 


Message-call into this account with an alternative account’s code, but persisting 
the current values for sender and value. Compared with CALL, DELEGATECALL 
takes one fewer arguments. The omitted argument is /x s [2]. As a result, /x a [3], 
/x s [4], /x a [5] and /x s [6] in the definition of CALL should respectively be replaced 
with /x s [2] , /r s [3], /x s [4] and /x s [5] . Otherwise it is exactly equivalent to CALL 
except: 


{<r',g',A + ,o) = 


©(o' , Is, Io, Ia, t, 

M s [0], I p , 0: 7 V ; i, /e + 1) 


if 


^ <r[/ a ] b A I e < 1024 


[ (a, g, 0, ()) otherwise 

Note the changes (in addition to that of the fourth parameter) to the second and 
ninth parameters to the call 0. This means that the recipient is in fact the same 
account as at present, simply that the code is overwritten and the context is almost 
entirely identical. 


Oxfe INVALID 0 0 Designated invalid instruction. 

Oxff SELFDESTRUCT 1 0 Halt execution and register account for later deletion. 

A' = A s U {/ a } 

ct'[/x s [ 0] mod 2 160 ] b = c[Ms[0] mod 2 160 ] b + <r[/ a ] b 
o-' /„ b 0 

J R s elfdestruct • if /a£A s 
J\. r — J\. r "T \ - - ‘ 

I 0 otherwise 

\ „ . \ ,G new account \ if <x[/x s [0] mod 2 160 ] = 0 

GsELFDESTRUCT^, = G sel f destruct'~T S 1 - ' . 

0 otherwise 


Appendix I. Genesis Block 
The genesis block is 15 items, and is specified thus: 

(228) ((0256, KEC(RLP(0)) , Oieo, stateRoot, 0, 0, O 20 4 s, 2 17 , 0, 0, 3141592, time , 0, 0 25 6, KEC((42))) , (), ()) 

Where 0 2 56 refers to the parent hash, a 256-bit hash which is all zeroes; Oi6o refers to the beneficiary address, a 160-bit 
hash which is all zeroes; 0 2 o48 refers to the log bloom, 2048-bit of all zeros; 2 17 refers to the difficulty; the transaction 
trie root, receipt trie root, gas used, block number and extradata are both 0, being equivalent to the empty byte array. 
The sequences of both ommers and transactions are empty and represented by (). KEC((42)) refers to the Keccak hash 
of a byte array of length one whose first and only byte is of value 42, used for themonce: KEC(RLP(()j) value refers to 
the hash of the ommer lists in RLP, both empty lists. 

The proof-of-concept series include a development premine, making the state root hash some value stateRoot. Also 
time will be set to the initial timestamp of the genesis block. The latest documentation should be consulted for those 
values. 


Appendix J. Ethash 

J.l. Definitions. We employ the following definitions: 


Name 

Value 

Description 

J wordbytes 

4 

Bytes in word. 

J datasetinit 

2 3° 

Bytes in dataset at genesis. 

J datasetgrowth 

2 23 

Dataset growth per epoch. 

J cacheinit 

2 24 

Bytes in cache at genesis. 

J cachegrowth 

2 17 

Cache growth per epoch. 

J epoch 

30000 

Blocks per epoch. 

J mixbytes 

128 

mix length in bytes. 

J hashbytes 

64 

Hash length in bytes. 

J parents 

256 

Number of parents of each dataset element. 

J cacherounds 

3 

Number of rounds in cache production. 

J accesses 

64 

Number of accesses in hashimoto loop. 


J.2. Size of dataset and cache. The size for Ethash’s cache cel and dataset del depend on the epoch, which in 
turn depends on the block number. 


(229) 


J epoch 


m 


lh. 

^ epoch 


The size of the dataset growth by Jdatasetgrowth bytes, and the size of the cache by Jcachegrowth bytes, every epoch. In 
order to avoid regularity leading to cyclic behavior, the size must be a prime number. Therefore the size is reduced by 
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a multiple of Jmixbytes, for the dataset, and Jhashbytes for the cache. Let d a i ze = ||d|| be the size of the dataset. Which 
is calculated using 

(230) d size — Eprime ( J dat asetinit + J data setgrowth ' E epoch Jmixbytes 5 Jmixbytes ) 

The size of the cache, c S j ze , is calculated using 

(231) Csi Z e — Eprime^J cachein.it T J cachegrowth ' E epoch Jhashbytes i Jhashbytes ) 


(232) 


Ep r i rn e{x,y') — 


x if x/y G 1 

Eprime{x — 1 ■ y, y) otherwise 


J.3. Dataset generation. In order the generate the dataset we need the cache c, which is an array of bytes. It depends 
on the cache size c a i ze and the seed hash s G B 32 . 

J.3.1. Seed hash. The seed hash is different for every epoch. For the first epoch it is the Keccak-256 hash of a series of 
32 bytes of zeros. For every other epoch it is always the Keccak-256 hash of the previous seed hash: 

(233) 

S C seedhash m 


(234) 


c, 


seedhash 


m = 


|kEC( 0 32 ) if Eepoc h {Hi) = 0 

\KEC(Cseedhash(Hi - J epoch )) otherwise 


With O 32 being 32 bytes of zeros. 


J.3.2. Cache. The cache production process involves using the seed hash to first sequentially filling up c 3 i ze bytes of 
memory, then performing Jcacherounds passes of the RandMemoHash algorithm created by Lerner [2014]. The initial 
cache c', being an array of arrays of single bytes, will be constructed as follows. 

We define the array ct, consisting of 64 single bytes, as the ith element of the initial cache: 


J KEC512(s) if i = 0 
1 KEC512(c)_i) otherwise 

c [i\ = Ci V i < n 


(235) ci 

Therefore c' can be defined as 

(236) 

(237) .. , 

_ J hashbytes _ 

The cache is calculated by performing Jcacherounds rounds of the RandMemoHash algorithm to the inital cache c': 

(238) C — E cac herounds(,C , Jcacherounds ) 

{ X if y = 0 

F/rmh (^-) if y ~ 1 

E C acherounds(EfiMii(x.') 1 y 1) otherwise 
Where a single round modifies each subset of the cache as follows: 

(240) FrMh(x) — (x, 0) , F/rmh (x, 1) , ..., (x, 71 1)) 

(241) Ermhi*-, i) = KEC512(x / [(i — 1 + n) mod n] © x' [x ; [i] [0] mod n]) 

with x' = x except x.'[j] = E r mh(x-,j) V j <i 

J.3.3. Full dataset calculation. Essentially, we combine data from J par ents pseudorandomly selected cache nodes, and 
hash that to compute the dataset. The entire dataset is then generated by a number of items, each Jhashbytes bytes in 
size: 

d s 


(242) 


d[l] — Edatasetitem(c, t) V % 


J) 


hashbytes 


In order to calculate the single item we use an algorithm inspired by the FNV hash (Glenn Fowler [1991]) in some cases 
as a non-associative substitute for XOR. 

(243) £ fnv (x, y) = (x ■ (0x01000193 ® y)) mod 2 32 
The single item of the dataset can now be calculated as: 

(244) Edatasetitemfci 'i') ~ Eparents(c, Z, 1,0) 


(245) 


-E'parents (c, Z, m) — 


Eparents (c, Z, p 1 ? Emix (m,C,i,p+ 1)) if p < Jparents - 2 

Emi x (m, c, i, p + 1 ) otherwise 
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(246) 


E rnix {m.,c,i,p) 


KEC512(c[i mod c s ; ze ] © i) if p = 0 

S F Nv(m, c[U FNV (i ©p, m[p mod [Jhashbytea/ Jwordb y tes\]) mod c s ,. e ]) otherwise 


J.4. Proof-of-work function. Essentially, we maintain a ’’mix” Jmixbytes bytes wide, and repeatedly sequentially fetch 
Jmixbyt.es bytes from the full dataset and use the E FNV function to combine it with the mix. Jmixbytes bytes of sequential 
access are used so that each round of the algorithm always fetches a full page from RAM, minimizing translation lookaside 
buffer misses which ASICs would theoretically be able to avoid. 

If the output of this algorithm is below the desired target, then the rnonce: is valid. Note that the extra application 
of KEC at the end ensures that there exists an intermediate nonce which can be provided to prove that at least a small 
amount of work was done; this quick outer PoW verification can be used for anti-DDoS purposes. It also serves to 
provide statistical assurance that the result is an unbiased, 256 bit number. 

The PoW-function returns an array with the compressed mix as its first item and the Keccak-256 hash of the 
concatenation of the compressed mix with the seed hash as the second item: 

(247) 

PoW(// H; d) = {m c (KEC (RLP (Lh (Hk) d), KEC(s h (KEC(RLP {L H {H n )))'(H_n) + m c (KEC(RLP(L H (# H )))j H n , d))} 
With Hpp being the hash of the header without the rnonce: The compressed mix m c is obtained as follows: 

n mix 

(248) m c (h, n,d) — Ecompress ( E accesses (d, s h(h, n), s h (h, n), — 1), —4) 

i = 0 

The seed hash being: 


(249) s h (h, n) = KEC512(h + E revert { n)) 
E r evert( n) returns the reverted bytes sequence of the [nonce: n: 

(250) E r evert (n) [f ] = n[||n|| - i] 


We note that the “©’’-operator between two byte sequences results in the concatenation of both sequences. 
The dataset d is obtained as described in section :_J .3.3! 

The number of replicated sequences in the mix is: 


(251) 


'ft'mix 


Jmixbytes 
_ J hashbytes _ 


In order to add random dataset nodes to the mix, the E acC esses function is used: 


(252) 


E accesses (d, m, s, i) 


{ Emixdataset (d, HI, S, i) if % — J accesses 2 

Eaccesses (Emixdataset( d, HI, S, i) , S, % d" l) Otherwise 


(253) Efnixdataset (d, m, s, i) 

— ^FNv(^! E new data (d, m,s ,i) 

E n e W data returns an array with n m ix elements: 

(254) 

dsize/ Jhashbyte 


Enewdata (d, m, s, i)\j\ = d[f5 FNV (i © s[0], m[£ mod 

The mix is compressed as follows: 

(255) 

m 


J mixbytes 
Jwordbytes _ 


]) mod 


+ j\ V j <n„ 


Ecompress (m, i ) 


if i ^ || m || — 1 

Ecompress ( Ey \ (£’ FN v(^ F Nv(m[i + 4], m [i + 5]), m [i + 6]), m[i + 7]), i + 8) otherwise 


